attic-labs / noms

The versioned, forkable, syncable database
Apache License 2.0
7.45k stars 267 forks source link

No integer type #3801

Closed ORBAT closed 6 years ago

ORBAT commented 6 years ago

The current noms numeric data type is backed by float64, so integers above 253 + 1 can't be represented

coriolinus commented 6 years ago

The workaround we're using:

// Int is an integer which can serialize itself to a noms blob.
//
// Noms doesn't provide a native integer type, so it's on us
// to handle this case. We manage it by providing this type wrapper which
// knows how to serialize itself to and from `[]byte`.
//
// Simple big-endian serialization is used.
type Int int64

// ToBlob converts this Int into a Blob datatype
func (b Int) ToBlob(db datas.Database) nt.Blob {
    bytes := make([]byte, 8)
    binary.BigEndian.PutUint64(bytes, uint64(b))
    return Blob(db, bytes)
}

// IntFromBlob creates a Int from a blob, if possible
func IntFromBlob(blob nt.Blob) (Int, error) {
    bytes, err := Unblob(blob)
    if err != nil {
        return Int(0), err
    }

    return Int(int64(binary.BigEndian.Uint64(bytes))), nil
}

Where Blob and Unblob are simple utility functions to convert between Blob and []byte.

aboodman commented 6 years ago

That works. FWIW, Noms provides built-in marshal interface. See:

https://godoc.org/github.com/attic-labs/noms/go/marshal#Marshal and https://godoc.org/github.com/attic-labs/noms/go/marshal#Marshaler.

So you can just say:

type Int int64

func (i Int) MarshalNoms(...) {
  ...
}

func (i *Int) UnmarshalNoms(..) {
  ...
}

Little more convenient. Also, backing an integer by a blob is a little unfortunate because there is a small chance (about 8/4096) of it being stored by multiple chunks on disk. If you back by a string instead, then it will always be one chunk.

Finally, I think it would be nicer if you could tell the difference between the "Int" type and other blob/string types in the noms data. So I might do something like this, personally:

type Int int

func (i Int) MarshalNoms(_ types.ValueReadWriter) (val types.Value, err error) {
  return types.NewStruct("Int", map[string]types.Value{
    "data": types.String(intToString(int(i))),
  }
}

// intToString would just be the same as converting to a byte array, but cast to string at the end.

... same for UnmarshalNoms in reverse ...

This way when you do like noms show on the CLI, or ask some data for its Type() or in any other way interact with the the Noms data, you'll be able to distinguish your integer type from other blobs/strings.

aboodman commented 6 years ago

Hello @coriolinus and @ORBAT. I hate seeing you have to work around shortcomings in Noms.

Please take a look at https://github.com/attic-labs/noms/pull/3806 and provide any feedback you may have. Thanks!

aboodman commented 6 years ago

Closing in favor of #3807.