Closed DavidBuchanan314 closed 8 months ago
yeeeah. i recently read specification of dag-cbor. the order of keys it not strict anymore in maps. but you are right the error must be throw on key duplication. i have todo in the code for it xd
moreover my current impl allows map keys not only for strings types. i'll change it soon too. and i hope it will give some small perf boost too
btw i have in my todo list adding of https://github.com/ipld/codec-fixtures/tree/master/fixtures for the tests. i already have some from popular json benchmarks https://github.com/MarshalX/python-libipld/tree/main/data
I'm pretty sure key ordering is still strict? https://ipld.io/specs/codecs/dag-cbor/spec/#strictness
"The keys in every map must be sorted length-first by the byte representation of the string keys"
I'm pretty sure key ordering is still strict? https://ipld.io/specs/codecs/dag-cbor/spec/#strictness
"The keys in every map must be sorted length-first by the byte representation of the string keys"
ah. i was confused because of ~different stricness for encoding and decoding. for decoding it's fine to not keep order.~ thank you!
i was confused because of
Map key ordering: map entries may be accepted in any order
Ah good point, seems like a "non-strict" decoder is allowed to allow them - I think we should be strict by default in the atproto-verse :)
mb i will add "strict" bool for decoding
Hm it looks like libipld.encode_dag_cbor()
is also non-strict with the order of map keys it emits
import libipld
import dag_cbor
encoded = libipld.encode_dag_cbor({'aaa': 1, 'x': 2}) # b'\xa2caaa\x01ax\x02'
dag_cbor.decode(encoded) # raises dag_cbor.decoding.err.DAGCBORDecodingError
you found all my TODOs xd
do you check "duplicate key" on encode? fixed in decode:
I don't check on encode because (as far as I know) it's impossible for a python dict to have duplicate keys in the first place.
It's plausible that edge-cases exist where you have a str and a subclass of str that compares non-equal but has equivalent str result (I don't think I allow str subclasses tho)
import json
import dag_cbor
import cbrrr
import libipld
class not_str(str):
def __eq__(self, __value: object) -> bool:
return False
def __hash__(self) -> int:
return super().__hash__()
hello = "hello"
not_hello = not_str("hello")
foo = {hello: "a", not_hello: "b"}
# !
print(foo) # {'hello': 'a', 'hello': 'b'}
# !!
print(json.dumps(foo)) # {"hello": "a", "hello": "b"}
# !!!
print(dag_cbor.encode(foo)) # b'\xa2ehelloaaehelloab'
# !!!
print(libipld.encode_dag_cbor(foo)) # b'\xa2ehelloaaehelloab'
# :sunglasses emoji:
try:
print(cbrrr.encode_dag_cbor(foo))
assert(not "reachable")
except TypeError:
pass
To be honest this is such a contrived test case that it's probably not worth caring about, but for the record, it is possible to create a dict with pseudo-duplicate keys
fix for keys order:
fixed in v1.2.2. published on pypi already. thank you!! btw it worth to rerun benchmarks cuz perf was affected too
Here are 3 test cases:
They all decode successfully, but they should fail, per the spec. (hashberg's dag_cbor, and my own cbrrr, fail as expected)
The first is equivalent to the negative test case you can find here https://github.com/ipld/codec-fixtures/blob/d93d0de6eff0cf2c2efff010c6b2a79aaeb7d100/negative-fixtures/dag-cbor/decode/duplicate-keys.json (I should probably PR the other two test cases there!)
I believe this is an upstream issue with https://github.com/ipld/serde_ipld_dagcbor but I don't know enough rust to make a neat test case for it, so I'm hoping you can help me move this up the chain :)