attic-labs / noms

The versioned, forkable, syncable database
Apache License 2.0
7.44k stars 266 forks source link

Add integer type #3802

Closed ORBAT closed 6 years ago

ORBAT commented 6 years ago

Hi folks!

To fix #3801, I've started work on adding an integer type backed by int64. This PR is more of an RFC, so the feature should not be considered finished.

What I'd like is some feedback on whether what I'm doing is

  1. sensible
  2. backwards-compatible
  3. the correct way to do this

My current idea is to not do uint* support, at least for the time being. It might be a good idea to add that while I'm at it, but this has turned out to be a, uh, non-trivial change.

Still to do (at least):

aboodman commented 6 years ago

So, actually the serialization of Noms' number type is currently arbitrary-precision, so it can represent integers. The only thing is that the API uses float64. What we should do instead is change the API to use some bignum abstraction (maybe optionally?).

I think my original thought was that you could marshal a Noms Number on to any of the builtin Go numeric types, or some other bignum type.

See https://github.com/attic-labs/noms/issues/1592

ORBAT commented 6 years ago

Thanks for the answer, not to mention the patience.

So, actually the serialization of Noms' number type is currently arbitrary-precision, so it can represent integers. The only thing is that the API uses float64. What we should do instead is change the API to use some bignum abstraction (maybe optionally?).

Ah, OK. So, I've got a few further questions and comments: looking at writeNumber, the way I take it is that the mantissa of Number is effectively limited by PutVarInt (so int64) if the serialization is to be backwards-compatible (?).

Personally I'd still like to have a separate integer type (although the serialization could be the same as with Number), as it'd allow you to be explicit about whether you expect floating point numbers or integers, leading to fewer surprises and runtime problems (with eg. truncation) or outright crashes as the types in noms would map more closely to types used by the application, and something that you decode to an integer will always be "backed" by an integer.

Having a separate type while allowing to have guarantees about truncation could be avoided somewhat with e.g. struct tags (unmarshaling could havenotrunc,integer or the like to make truncation an error, for example). What are your views on providing an explicit Int type vs. some other solution with just Number?

aboodman commented 6 years ago

Ah, OK. So, I've got a few further questions and comments: looking at writeNumber, the way I take it is that the mantissa of Number is effectively limited by PutVarInt (so int64) if the serialization is to be backwards-compatible (?).

My recollection is that the encoding used by PutVarint can be extended to arbitrary length (it's just that the api uses int64).

would a better way to approach this be having Number be backed by big.Float (plus modifying float64ToIntExp et al in number_util.go), + needed changes to the number-related decoders&encoders in marshal? How would this approach deal with serializing integers over math.MaxInt64?

There are a number of layers in the design to consider independently:

1/ What happens at the serialization level. How do we serialize numbers? Are there different types for different classes of numbers in Noms, or just one?

2/ What happens at the types.Value layer? What Golang types represent the Noms number types? Right now it is always float64. The sort of natural thing in the current design would be to change this to big.Float. We could extend types.Number with various conveniences to get in and out of the standard Go numeric types.

3/ What happens at the marshaling layer. Not sure if you're even familiar with this API layer, but there's the marshal package, which we used extensively in our own applications of Noms and was very convenient high-level API to work with Noms. Currently (I think) you can marshal Noms Number type onto any Golang numeric type, and it panics if the conversion can't be done cleanly.

Personally I'd still like to have a separate integer type (although the serialization could be the same as with Number)

At what layer are you thinking? We could introduce better support at higher layers for different kinds of numeric types without having to have explicit distinction at the serialization level.

, as it'd allow you to be explicit about whether you expect floating point numbers or integers, leading to fewer surprises and runtime problems (with eg. truncation) or outright crashes as the types in noms would map more closely to types used by the application, and something that you decode to an integer will always be "backed" by an integer.

In our usages, we used the marshal package to deal with this problem and other related ones ("what shape struct am I expecting at this location", etc).

aboodman commented 6 years ago

Looking at this all again I'm wondering why we didn't choose to serialize as a rational number, since it can represent more values accurately :-/. Bummer.

aboodman commented 6 years ago

@ORBAT I'm going to go ahead and close this in favor of eventually implementing something like: #3806