fxamacker / cbor

CBOR codec (RFC 8949) with CBOR tags, Go struct tags (toarray, keyasint, omitempty), float64/32/16, big.Int, and fuzz tested billions of execs.
MIT License
748 stars 61 forks source link

feature: Configurable handling of unrecognized simple values #477

Closed benluddy closed 6 months ago

benluddy commented 10 months ago

Is your feature request related to a problem? Please describe.

Currently, decoding well-formed but unrecognized CBOR simple values into interface{} produces a concrete value of type cbor.SimpleValue, and decoding the same into any width int, uint, float, or a math/big.Int converts the numeric value to the destination type as needed.

I would like to be able to configure what the decoder does when it encounters an unrecognized simple value.

Describe the solution you'd like

A decode option with modes:

  1. (default) Current behavior
  2. Return an error
  3. Interpret as a substitute value (https://www.rfc-editor.org/rfc/rfc8949.html#name-converting-from-cbor-to-jso suggests null when converting to JSON, and this is consistent with the current treatment of the "undefined" simple value as nil)

Describe alternatives you've considered

For the interface{} case, it would be possible to traverse the object output by the decoder to find any cbor.SimpleValue values, but that traversal would be fairly expensive.

For the numeric destination type case, the application doesn't have any way to detect that an unrecognized simple value data item was received versus a numeric CBOR type. I think this is a little surprising given that a simple value's assigned number is more of an identifier than a quantity.

Additional context

fxamacker commented 10 months ago

Thanks for opening this issue!

The first two options (current behavior or returning error) sounds good.

The third option (substitute value) brings up some questions and edge cases:

Thoughts?

benluddy commented 10 months ago
  • do we want option to use a substitute value other than nil?

How would you feel about a map from simple value numbers to Go values?

The simple value space is so small that it would not be unreasonable to assign any substitute value by enumerating all of the possible simple values. There would be no need to include a mode to interpret unknown simple values as an explicit default value. As a secondary benefit, it would also make it possible for applications to configure other Go analog values for true, false, null, and (especially) undefined.

There would potentially be no need for an "unrecognized simple value" mode at all. By default, the simple value map could be populated with entries mapping to cbor.SimpleValue(N), true, false, and nil, and the behavior on unrecognized simple value could always be error.

  • how do we want to handle map key collision if unrecognized simple values are used as key?

Does my proposed answer to the first question help answer this one? Since the only allowable simple values would be those the user has explicitly configured, the user would control whether or not any two simple values are treated as duplicates for their specific data model.

fxamacker commented 10 months ago

@benluddy That sounds great! :+1:

Does my proposed answer to the first question help answer this one?

Yes because users can specify unique substitute values for unassigned simple value identifiers.

To confirm, the new decoding option would specify a substitute value for unassigned simple value identifiers, which would only be used when decoding into interface{}. The unassigned simple value identifiers being the numbers:

If you open a PR for this, I'd be happy to review it (probably this weekend)!

benluddy commented 10 months ago

Thanks!

What do you think about unifying this option for both parse-to-value and parse to empty interface? For parse-to-value, I think it would be practical to do a conversion from the substitute value to the destination value (with overflow checks for numeric types) in the general case, and special case substitute values of type SimpleValue using fillPositiveInt to avoid breaking the current usage.

I'll open a draft PR now to illustrate what I'm imagining.

benluddy commented 10 months ago

I'll open a draft PR now to illustrate what I'm imagining.

Just opened https://github.com/fxamacker/cbor/pull/481. I think something like this would be backwards-compatible, and makes for consistent configuration regardless of the decode destination type.

fxamacker commented 6 months ago

Thanks Ben! Closed by #481.