Kotlin / kotlinx.serialization

Kotlin multiplatform / multi-format serialization
Apache License 2.0
5.32k stars 619 forks source link

Strict mode for CBOR #2662

Open timmc opened 4 months ago

timmc commented 4 months ago

What is your use-case and why do you need this feature?

My app needs to be able to sign and verify serialized data, which requires that the parse be unambiguous. The main problem is duplicate map keys. If there are repeated keys, then two recipients may disagree on what the signed data says, which is obviously a problem. :-) I've written on this kind of parser mismatch vulnerability previously: https://www.brainonfire.net/blog/2022/04/29/preventing-parser-mismatch/

The CBOR implementation currently accepts duplicate keys by using the last one encountered, but the spec says in §2.1:

A map that has duplicate keys may be well-formed, but it is not valid

§3.10 "Strict Mode" then waffles a bit and puts the onus on the sender to "avoid ambiguously decodable data" and defines an optional strict mode that decoders are not required to implement, particularly calling out duplicate key detection as a possibly heavy-weight requirement.

However, kotlinx.serialization already has the ability to detect duplicate keys because it builds up a HashMap or similar which can simply be consulted before insertion. (There's no streaming interface that I'm aware of.) The language in §3.10 is likely aimed at embedded and other highly constrained environments, which isn't particularly relevant to Kotlin.

Describe the solution you'd like

  1. Generally support rejection of repeated map keys, as an optional feature.
  2. Expose this as part of a new strict-mode feature in the Cbor implementation, enabled by default to comply with §2.1.

(I'd also love for the duplicate key rejection to be enabled by default for all formats, as this is a known vulnerability in many, many protocols and people should be protected by default—but still able to explicitly opt out of this protection if they know what they're doing. Major version bump, I know.)

Note: https://github.com/Kotlin/kotlinx.serialization/issues/1990 covers rejection of duplicate keys, but it wasn't clear if that was meant to be specific to JSON.

timmc commented 4 months ago

(Updated with more info from the spec.)

sandwwraith commented 4 months ago

Yes, it sounds very reasonable. I do not mind adding it at some point in the future or via contribution.

timmc commented 4 months ago

Great, I may try to work up a PR. Where would you recommend getting started? I found insertKeyValuePair in CollectionSerializers—but it didn't seem that the tests actually exercise that code path, and I can't find anything that calls insertKeyValuePair.

pdvrieze commented 4 months ago

@timmc What happens from a format perspective is that it recognizes that this is a collection serializer. Then it will call the update decoder (at some point it was part of DeserializerStrategy, but it is no longer) with the previous list value.That update will create/update the collection with the new value (or not). It would be possible to create a new version of this function that takes a strict argument (with the default implementation of the previous one passing false for strictness).

timmc commented 4 months ago

Would you be able to point me to a code location? I'm not finding useful references to update, CollectionSerializer, etc. I just discovered that LinkedHashMapSerializer extends LinkedHashMap, but I'm having trouble finding the code that actually puts key-value pairs into a map when deserializing map data.

timmc commented 3 months ago

OK, found it -- builder[key] = value in MapLikeSerializer. Starting a branch here: https://github.com/Kotlin/kotlinx.serialization/compare/dev...timmc:timmc/cbor-strict

timmc commented 3 months ago

I've found it straightforward to reject duplicate keys. Making it configurable would be a larger undertaking—I can either plumb it through via the serializers or via the descriptors, but there's not really a clean way to do it since there's no config object being passed along. Instead, I would have to add allowDuplicateKeys: Boolean to a large number of method and constructor signatures, and I still haven't found the end of that chain.

Would you accept a PR that just forbids duplicate keys in maps and objects, with an option to later add a configuration option that permits duplicate keys?

sandwwraith commented 3 months ago

@timmc I don't think it is a good idea to change existing behavior so that people would start receiving exceptions on a code that worked before, especially without an option to bring old behavior back. So it's better to have it. Flags are already stored in Cbor object, which is available in every CborReader. So you need to pass this object further down to CborDecoder, and that's likely it.

pdvrieze commented 3 months ago

@timmc The code involved is:

https://github.com/Kotlin/kotlinx.serialization/blob/51cb8e8e556983fc83a565d5f04bb089363453e0/core/commonMain/src/kotlinx/serialization/internal/CollectionSerializers.kt#L99-L113

and

https://github.com/Kotlin/kotlinx.serialization/blob/51cb8e8e556983fc83a565d5f04bb089363453e0/core/commonMain/src/kotlinx/serialization/internal/CollectionSerializers.kt#L26-L51

It is important to note that for some formats it is valid to read list/map items one by one (for example in Protobuf): https://github.com/Kotlin/kotlinx.serialization/blob/51cb8e8e556983fc83a565d5f04bb089363453e0/formats/protobuf/commonMain/src/kotlinx/serialization/protobuf/internal/ProtobufDecoding.kt#L210-L218

This later case requires special handling of the MapSerializer. So you have two separate cases to support this generically (reading multiple items at a time, directly in MapSerializer; and repeatedly reading sublists).

sandwwraith commented 3 months ago

I'm not sure it is a good idea to add changes to CollectionSerializer since it is used by all formats. You also need to check whether regular objects have duplicate keys. It seems you have to track keys manually, with some kind of Set.

pdvrieze commented 3 months ago

@sandwwraith Indeed modifying CollectionSerializer would be a major undertaking with various compatibility concerns/challenges.

timmc commented 3 months ago

I don't think CborMapReader has enough information to build and check a list of keys. By the time decodeElementIndex has been called (in the parent class CborListReader), MapLikeSerializer.readElement has already called decodeSerializableElement to read the key.

Maybe it would make sense for MapLikeSerializer to call a new CompositeDecoder.visitedKey method that accepts a key and then has the option of throwing an exception?

pdvrieze commented 3 months ago

Maybe it would make sense for MapLikeSerializer to call a new CompositeDecoder.visitedKey method that accepts a key and then has the option of throwing an exception?

I like this solution (even though some bits of the name/semantics could be "improved"). The function can have a default implementation that results in the current behaviour. Note that you should probably also think about the set serializer (sets also don't allow duplicate values/only keep the last one).

I had a look at the library source code. You probably want to do something that checks the result of put/add which normally return the previous value. Doing this will maintain the efficiency of the implementation (contains checks are not cheap)

timmc commented 3 months ago

Yeah, it seems to work out OK -- I have https://github.com/Kotlin/kotlinx.serialization/pull/2681 which still needs some work but should be on the right path.