aaronhuggins / cbor-redux

The Concise Binary Object Representation (CBOR) data format (RFC7049) implemented in pure JavaScript, revived.
https://doc.deno.land/https://deno.land/x/cbor_redux/mod.ts
Other
29 stars 6 forks source link

Forbid duplicate map keys during decoding #11

Closed letmaik closed 1 year ago

letmaik commented 3 years ago

Maps with duplicate maps are not valid according to the CBOR spec. While the spec leaves it to the specific protocol where exactly this is dealt with I think it would be trivial to add this check to cbor-redux in this code path: https://github.com/aaronhuggins/cbor-redux/blob/22ba58fb7d8d30e2fc85c2f9dbad6f7c53785037/src/CBOR.ts#L230-L236

himself65 commented 3 years ago

Faced the same issue.

aaronhuggins commented 2 years ago

@letmaik I've read the spec, given this a ton of thought, and I feel that overwriting is an appropriate handle per spec. It seems to leave room for implementation-specific parsing and I believe that an overwrite on duplicate key is appropriate fault-tolerance.

While there are more complex handling scenarios in other implementations (see for example https://github.com/fxamacker/cbor/issues/122), I question the complexity of such a feature at this time.

The fault-tolerance in the lib is also in-line with what developers should expect for JSON.parse. Example:

const json_dup = `
{
  "key1": "val1",
  "key2": "val2a",
  "key2": "val2b",
  "key3": "val3"
}
`
console.log(JSON.parse(json_dup))

For these reasons, I would like to close this issue as wont fix; however, I feel that adding a feature for optional strictness like fxmacker/cbor with a default for fault-tolerance would be acceptable. Anyone submitting a PR implementing this feature would be welcome.

letmaik commented 2 years ago

This feature would be required to implement COSE validators in a secure manner: https://datatracker.ietf.org/doc/html/rfc8152

Applications MUST NOT generate messages with the same label used twice as a key in a single map. Applications MUST NOT parse and process messages with the same label used twice as a key in a single map. Applications can enforce the parse and process requirement by using parsers that will fail the parse step or by using parsers that will pass all keys to the application, and the application can perform the check for duplicate keys.

aaronhuggins commented 2 years ago

I believe from a language perspective that the first must not is satisfied by the encode process.

The second must not I am interpreting as "don't present more than one key/value with the same label to the application". Can you help me understand how this is flawed? I must be missing something (and there's a few other CBOR libs that would have also missed this, not an excuse, just part of my reasoning).

The decode process would need to add some intermediate code for either the hard fail or the passing of keys (maybe via some "reviver" style function [?] that can pump out the key/value pairs as they are decoded) to the application.

Thanks for the cycle of feedback, @letmaik.

aaronhuggins commented 2 years ago

I'm in favor of the following, with proper documentation:

  1. Add a configuration API with mode: 'loose'|'strict', where loose is the default
  2. Loose would be the current behavior, nothing happens; strict would hard fail on subsequent encounter of the same key in a map
  3. Add an observer API which would pump out key/value pairs, allowing an application to handle duplicates in loose mode

Something like:

interface CBOROptions {
  /** Set the dictionary type for supported environments; defaults to `object`. */
  dictionary?: 'object' | 'map';
  /** Set the mode to `strict` to hard fail on duplicate keys in CBOR dictionaries; defaults to `loose`. */
  mode?: 'loose' | 'strict';
  /** For environments which support Web Streams, get a new ReadableStream for every decode. */
  observe?: (stream: ReadableStream<{ key: any, value: any }>) => void;
}

/** Sets or gets the internal options of CBOR decode/encode. */
CBOR.options (options?: CBOROptions): readonly CBOROptions;

What do you think of this idea @letmaik?

letmaik commented 2 years ago

Setting options as global state is quite bad I think. It should be an argument to the relevant encode/decode functions. Otherwise two unrelated parts in an application can't use the library with different options at the same time. Otherwise, the dictionary and mode options look good. I haven't used streams really to make a recommendation on the observe API, but intuitively I would suggest to leave such an API out for now until someone asks for it. It's easy to mess up the design on such things.

aaronhuggins commented 2 years ago

Yuck, you're right, module-wide options wouldn't work well in this context at all.

What do you think of putting all the options in an options bag to be passed to decode? This would be a breaking change to the decode API; but I'm planning on making this a 1.0 release, so I think that's acceptable.

Current API:

CBOR.decode<T = any>(
  data: ArrayBuffer | SharedArrayBuffer,
  tagger?: TaggedValueFunction,
  simpleValue?: SimpleValueFunction,
) => T

Options bag API:

interface CBOROptions {
  /** Set the dictionary type for supported environments; defaults to `object`. */
  dictionary?: 'object' | 'map';
  /** Set the mode to `strict` to hard fail on duplicate keys in CBOR dictionaries; defaults to `loose`. */
  mode?: 'loose' | 'strict';
  /** For environments which support Web Streams, get a new ReadableStream for each encountered object. */
  observe?: (stream: ReadableStream<{ key: any, value: any }>, key?: any) => void;
  /** A function that extracts tagged values. This function is called for each member of an object. */
  tagger?: TaggedValueFunction;
  /** A function that extracts simple values. This function is called for each member of an object. */
  simpleValue?: SimpleValueFunction;
}

CBOR.decode<T = any>(
  data: ArrayBuffer | SharedArrayBuffer,
  options?: CBOROptions,
) => T

Side note: the more I think about the observe API, the more I dislike it and the less likely I am to include it in it's current proposed form.

aaronhuggins commented 2 years ago

Yeah, I'm going to drop observe altogether and add reviver to the decode API and replacer to the encode API. I have a use case for that functionality, and it will be more ergonomic since it replicates something I like about the JSON APIs.

letmaik commented 2 years ago

The options bag looks fine, I'm not worried about breaking changes for 1.0.

aaronhuggins commented 2 years ago

@letmaik Hey, may I pick your brain a little bit more? What do you think of the docs here? https://doc.deno.land/https://raw.githubusercontent.com/aaronhuggins/cbor-redux/0c8b3f4fa619fee986dd91e254d999fc9cfd396e/mod.ts

That document should be largely applicable to Node/Deno/Web. Would be very happy if you don't mind providing more feedback, but I understand if that's too much to ask.

I've got everything implemented per #17 for this issue and many others. I think I'm close to ready for a 1.0.0 release candidate. Really want to get CI/CD set up before cutting a prelease.

Either way, I'll update this issue once a prerelease is cut so you can take it for a spin.

letmaik commented 2 years ago

The docs look ok to me, I think some code examples wouldn't hurt.