cbor-wg / edn-literal

Application-oriented literals for CBOR extended diagnostic notation
Other
0 stars 3 forks source link

Clarify: Long numbers and bignums #38

Closed chrysn closed 3 weeks ago

chrysn commented 1 month ago

Implementing EDN, I found no help in the text:

If an EDN processor encounters (say) 987654321098765432310, should it err out or process that into a tag 2 bignum?

Similarly, if a floating point exponent exceeds the expressible range, should it produce a tag 5 bigfloat? Should it also produce a bigfloat if there are digits in the decimal representation that would be discarded by float conversion?

(I was leaning toward "yes" for the first question, but that would point towards "yes" for the later questions as well, and that becomes increasingly difficult while at the same time becoming increasingly unpredictable).

chrysn commented 1 month ago

Reading cbor-cde-02 would indicate the other direction. While not directly interacting with EDN, it does unify tag2/3 into the major type 0/1 integers, while explicitly not making such statements on tag 5.

Questions remain on encoding indicators. Does 987654321098765432310_i become valid, because it is 2(h'358a750438f380f5f6'_i), or are encoding indicators just not applicable to larger numbers, as they require the (still legal) alternative form where both lengths can be given (2_3(h'358a750438f380f5f6'_3) sounds legal)?

cabo commented 1 month ago

I agree that the general direction of CBOR has been to make the threshold between mt0/1 integers and tag2/3 integers smaller. No such thing happened with mt7 float and tag 4/5.

Re the encoding indicator: When would you ever want to use a long form for tag 2/3? This is only really relevant for test vectors, and the 2_3(h'358a750438f380f5f6'_3) form covers those. Someone could register an encoding indicators that simplifies this, if we ever want it. So the remaining question is what 987654321098765432310_i might mean. This could be for the byte string head. But we also have leading zero bytes in the byte string. RFC 8949 says:

The preferred serialization of the byte string is to leave out any leading zeroes

so an encoding indicator could be used to leave space using leading zero bytes. This might be used in those "template in ROM, fill in the blanks" situations. Existing encoding indicators don't really fit here, so I'd also leave this to an extension (the unpacked form is available until then).

chrysn commented 1 month ago

From today's interim: doing encoding indicators for long numbers would be leave-to-extension, but stating that longnums are supposed to become tag2/3 would be helpful text at this stage already.