denoland / std

The Deno Standard Library
https://jsr.io/@std
MIT License
3.17k stars 621 forks source link

feat(cbor/unstable): introduce `@std/cbor` #5909

Closed BlackAsLight closed 3 weeks ago

BlackAsLight commented 2 months ago

Closes https://github.com/denoland/std/issues/5479

This pull request introduces a CBOR implementation based off the RFC 8949 spec from scratch. It introduces functions like encodeCbor, encodeCborSequence, decodeCbor, and decodeCborSequence, and a CborTag class to provide additional semantic information.

It also introduces streaming versions called:

It should be noted the different naming convention used between the "encoder", "decoder" and "decoded". The "encoder" and "decoder" classes are TransformStreams, while the "decoded" classes are ReadableStreams and act merely as a way for the user to figure out what type they have.

Due to the way streams work, if one of the "decoded" streams are yielded then they'll need to either be entirely consumed or cancelled before the next value will be yielded. This should be noted when using things like Array.fromAsync. Such a function would work only if you can guarantee that no value yielded will be one of these "decoded" streams. If such a value is yield in such a function then it will hang indefinitely.

Example 1

import { assert, assertEquals } from "@std/assert";
import { decodeCbor, encodeCbor } from "@std/cbor";

const rawMessage = [
  "Hello World",
  35,
  0.5,
  false,
  -1,
  null,
  Uint8Array.from([0, 1, 2, 3]),
];

const encodedMessage = encodeCbor(rawMessage);
const decodedMessage = decodeCbor(encodedMessage);

assert(decodedMessage instanceof Array);
assertEquals(decodedMessage, rawMessage);

Example 2

import { assert, assertEquals } from "@std/assert";
import { CborTextEncoderStream, CborTextDecodedStream, CborSequenceDecoderStream } from "@std/cbor";

const reader = CborTextEncoderStream.from(["Hello World!"])
  .readable
  .pipeThrough(new CborSequenceDecoderStream()).getReader();

const { done, value } = await reader.read();
assert(done === false);
assert(value instanceof CborTextDecodedStream);
assertEquals((await Array.fromAsync(value)).join(''), "Hello World!")

assert((await reader.read()).done === true);
reader.releaseLock();

Example 3

import { encodeBase64Url } from "@std/encoding";
import {
  CborArrayDecodedStream,
  CborArrayEncoderStream,
  CborByteDecodedStream,
  CborByteEncoderStream,
  CborMapDecodedStream,
  CborMapEncoderStream,
  type CborOutputStream,
  CborSequenceDecoderStream,
  CborSequenceEncoderStream,
  CborTag,
  CborTextDecodedStream,
  CborTextEncoderStream,
} from "@std/cbor";

const rawMessage = [
  undefined,
  null,
  true,
  false,
  3.14,
  5,
  2n ** 32n,
  "Hello World",
  new Uint8Array(25),
  new Date(),
  new CborTag(33, encodeBase64Url(new Uint8Array(7))),
  ["cake", "carrot"],
  { a: 3, b: "d" },
  CborByteEncoderStream.from([new Uint8Array(7)]),
  CborTextEncoderStream.from(["Bye!"]),
  CborArrayEncoderStream.from([
    "Hey!",
    CborByteEncoderStream.from([new Uint8Array(18)]),
  ]),
  CborMapEncoderStream.from([
    ["a", 0],
    ["b", "potato"],
  ]),
];

async function logValue(value: CborOutputStream) {
  if (
    value instanceof CborByteDecodedStream ||
    value instanceof CborTextDecodedStream
  ) {
    for await (const x of value) console.log(x);
  } else if (value instanceof CborArrayDecodedStream) {
    for await (const x of value) logValue(x);
  } else if (value instanceof CborMapDecodedStream) {
    for await (const [k, v] of value) {
      console.log(k);
      logValue(v);
    }
  } else if (value instanceof CborTag) {
    console.log(value);
    logValue(value.tagContent);
  } else console.log(value);
}

for await (
  const value of ReadableStream.from(rawMessage)
    .pipeThrough(new CborSequenceEncoderStream())
    .pipeThrough(new CborSequenceDecoderStream())
) {
  logValue(value);
}
codecov[bot] commented 2 months ago

Codecov Report

Attention: Patch coverage is 90.26275% with 126 lines in your changes missing coverage. Please review.

Project coverage is 96.55%. Comparing base (065296c) to head (ec68c5c). Report is 1 commits behind head on main.

Files with missing lines Patch % Lines
cbor/sequence_decoder_stream.ts 75.78% 83 Missing and 2 partials :warning:
cbor/sequence_encoder_stream.ts 85.82% 18 Missing :warning:
cbor/_common_decode.ts 95.81% 10 Missing :warning:
cbor/_common.ts 90.66% 7 Missing :warning:
cbor/_common_encode.ts 96.63% 4 Missing :warning:
cbor/byte_encoder_stream.ts 97.95% 1 Missing :warning:
cbor/text_encoder_stream.ts 98.03% 1 Missing :warning:
Additional details and impacted files ```diff @@ Coverage Diff @@ ## main #5909 +/- ## ========================================== - Coverage 96.75% 96.55% -0.21% ========================================== Files 509 530 +21 Lines 39175 40483 +1308 Branches 5795 6067 +272 ========================================== + Hits 37905 39087 +1182 - Misses 1228 1352 +124 - Partials 42 44 +2 ```

:umbrella: View full report in Codecov by Sentry.
:loudspeaker: Have feedback on the report? Share it here.

kt3k commented 2 months ago

I wonder if these need to be classes when they both don't have any instance states.

How about making them just deocde and encode functions? That mirrors the structure of std/msgpack

BlackAsLight commented 2 months ago

I wonder if these need to be classes when they both don't have any instance states.

How about making them just deocde and encode functions? That mirrors the structure of std/msgpack

Sure. I was trying to mirror the structure of TextEncoder/TextDecoder, but this other structure seems more common in the std.

kt3k commented 2 months ago

Side note: TextDecoder stores some chunk from the previous decode when stream option is specified. Probably that is why it's implemented as a class

iuioiua commented 1 month ago

@BlackAsLight, could you please draft this PR and un-draft once ready for us to review?

BlackAsLight commented 1 month ago

Is there a way I can get the Sentry bot to update or create a new report on what lines are missing?

kt3k commented 1 month ago

@BlackAsLight

Coverage diff of this PR should be visible in this page https://app.codecov.io/gh/denoland/std/pull/5909

CborSequenceDecoderStream CborByteDecodedStream CborTextDecodedStream CborArrayDecodedStream CborMapDecodedStream

There are mixed naming styles, FooDecoderStream and FooDecodedStream. Are these differences intentional?


We'd like to export a single API from a single file. Can you split files based on exported APIs

BlackAsLight commented 1 month ago

@BlackAsLight

Coverage diff of this PR should be visible in this page https://app.codecov.io/gh/denoland/std/pull/5909

That page hasn't gotten updated in like 5 days now, which is why I asked.

CborSequenceDecoderStream CborByteDecodedStream CborTextDecodedStream CborArrayDecodedStream CborMapDecodedStream

There are mixed naming styles, FooDecoderStream and FooDecodedStream. Are these differences intentional?

The difference in naming style is intentional. The "Encoder" and "Decoder" ones are TransformStreams that actually do the work of converting, while the "Decoded" ones are ReadableStreams that act as a simple wrapper so the user of the lib is able to know what type of chunks to expect.

I did explore the idea of having the "Decoded" ones also be TransformStreams and handling the logic of conversion, but the logic there seemed too complicated as you don't know how far to go until you've actually decoded it.

I also explored the idea of having the other "Encoder" ones (i.e. CborByteEncodedStream) act as a mere wrapper for CborSequenceEncoderStream, but that then seemed redundant if all you wanted to send for example a byte string, as more checks would essentially need to be done for no reason.

We'd like to export a single API from a single file. Can you split files based on exported APIs

* `encode_cbor.ts` exports `encodeCbor`

* `encode_cbor_sequence.ts` exports `encodeCborSequence`

* `tag.ts` exports `CborTag`

* `sequence_encoder.ts` exports `CborSequenceEncoderStream`

* `byte_encoder_stream.ts` exports `CborByteEncoderStream`

* `text_encoder_stream.ts` exports `CborTextEncoderStream`

* `array_encoder_stream.ts` exports `CborArrayEncoderStream`

* `map_encoder_stream.ts` exports `CborMapEncoderStream`

* `decode_cbor.ts` exports `decodeCbor`

* `decode_cbor_sequence.ts` exports `decodeCborSequence`

* `sequence_decoder_stream.ts` exports `CborSequenceDecoderStream`

* `byte_decoded_stream.ts` exports `CborByteDecodedStream`

* `text_decoded_stream` exports `CborTextDecodedStream`

* `array_decoded_stream` exports `CborArrayDecodedStream`

* `map_decoded_stream` exports `CborMapDecodedStream`

That is a lot of files, but I can do that.

kt3k commented 4 weeks ago

I'll move the files *_decoded_stream.ts to the private paths (_*_decoded_stream.ts) and remove them from exports entries in cbor/deno.json as they are not intended to be used directly from the users. Let me know if you disagree with this change.

BlackAsLight commented 4 weeks ago

I'll move the files *_decoded_stream.ts to the private paths (_*_decoded_stream.ts) and remove them from exports entries in cbor/deno.json as they are not intended to be used directly from the users. Let me know if you disagree with this change.

The user isn't meant to create instances of the decoded streams themselves, but they do need access to them to be able to figure out what type they have. For example with an entry instanceof ByteDecodedStream.

Edit: maybe a Boolean function instead could be used instead by the user? isByteDecodedStream(entry). That's way there is no risk of incorrect use of the classes?

kt3k commented 4 weeks ago

The user isn't meant to create instances of the decoded streams themselves, but they do need access to them to be able to figure out what type they have. For example with an entry instanceof ByteDecodedStream.

instanceof check makes sense. Currently they are still exported from mod.ts. Maybe it's also good to export them from relevant decoder streams' endpoints.

kt3k commented 4 weeks ago

@BlackAsLight I added the exports of CborArrayDecodedStream, CborByteDecodedStream, CborMapDecodedStream, CborTextDecodedStream from cbor/sequence-decoder-stream. What do you think?

BlackAsLight commented 4 weeks ago

@BlackAsLight

I added the exports of CborArrayDecodedStream, CborByteDecodedStream, CborMapDecodedStream, CborTextDecodedStream from cbor/sequence-decoder-stream. What do you think?

Looks good to me