elm-toulouse / cbor

🤖 An Elm library implementing: RFC 7049 Concise Binary Object Representation (CBOR)
http://cbor.io/
MIT License
10 stars 1 forks source link

Thoughts about integers > 2^53? #4

Closed mpizenberg closed 9 months ago

mpizenberg commented 10 months ago

For some use cases, it may be needed to encode/decode integers that are bigger than 2^53. In the CBOR spec, I see that there is a dedicated tag for unbound integers. It seems this lib supports tags so I guess that solves my concern for integers bigger than 2^64.

However I'm still interested in having a way to encode/decode integers between 2^53 and 2^64, that are encoded as cbor u64. So on the CBOR side, it's an integer, and let's say on the elm type, we have a way of representing these numbers with a bigint impl or with strings, or whatever. What are you thoughts on the encoding/decoding. Does the API enables this in any way?

KtorZ commented 10 months ago

The annoying part with unbounded integers is that it forces users to adopt a 'BigInt' type across an entire codebase. At the moment, anything beyond 2^53 will explicitly fail to decode:

https://github.com/elm-toulouse/cbor/blob/3cf6ab18ea42e6d2f3c41c5d79645e06ee822e8d/src/Cbor/Decode.elm#L1257-L1273

I honestly don't think there's much else to do here from a library standpoint. I'd like to avoid forcing something like BigInt on all APIs, but we could imagine having a new module endpoint bigint which would decode integers as BigInt and be safe not only in the range 2^53 / 2^64 but also above when tagged.

This way, the decision is left to application down the line to choose the precision they need.

mpizenberg commented 10 months ago

I totally agree with int < 2^53 being the default decoder. What I'm thinking of as you said, is an additional endpoint. And in order to be independent of any particular implementation I was thinking of returning a hex string instead.

-- Decode an integer or Bignum (tag 2) into a hex string
Cbor.Decode.bigint : Decoder String
-- Encode a hex string into an integer (if < 2^64) or Bignum (tag 2)
Cbor.Encode.bigint : String -> Encoder

I see there is tagged : Tag -> (a -> Encoder) -> a -> Encoder and raw : Bytes -> Encoder in the api. Are these enough to write something like the E.bigint above? Same question for the decoder equivalent.

KtorZ commented 10 months ago

@mpizenberg I was thinking of returning a hex string instead.

Not a big fan. If anything, I'd go for a Bytes which could use some well-known encoding with a chosen endianness. And at the same time, there's already a BigInt library in Elm which looks decent, so we could as well just embrace that.

mpizenberg commented 10 months ago

Bytes is nice. By choosing big endian, which is the standard for most network protocols, it can also be very efficient as it's just copying the raw bytes from the CBOR encoded data.

I wouldn't go for a big int library. As you can see, we already don't agree on which library we would use, as I'd go for elm-natural