arangodb / velocypack

A fast and compact format for serialization and storage
Other
420 stars 40 forks source link

Endianess, bitness #101

Closed dumblob closed 3 years ago

dumblob commented 3 years ago

The readme says it's platform independent. Does it mean it's also fully endianess-independent as well as bitness (32bit, 64bit, ...bit) independent?

jsteemann commented 3 years ago

@dumblob: serialized booleans, integers, strings, arrays, objects etc. all have a defined endianess and length, which is platform-independent. So machine endianess and bit size won't matter for these.

There are still a few caveats when it comes to endianess and bit size: Regarding bit size, the only supported values currently are 32 and 64 bits. This should not be a problem in reality, although in the ancient past there have been 36 bit systems and such. In addition, it is possible to build up very large values on a 64 bit system, but it may not be possible to read them back on a 32 bit system. This is because the max. allocation size on a 32 bit system may be severely limited compared to a 64 bit system. If all velocypack values are kept small enough so that they are well below the 32 bit length boundaries, this should not matter.

The velocypack type External is also just a raw pointer and should be used only during building up values that reference other values that exist somewhere else in memory. It shouldn't be used for serialization because it is a pointer into memory, and it is also not portable. Not using the External type for any data that is serialized will avoid this problem too.

The velocypack type Custom is completely user-defined, so it is up to the embedder if these types are portable or not. There is no default implementation for Custom types, so these types only become a potential portability issue if actively adds code for them.

Last but not least, Double values are serialized as integer equivalents in a specific way, and unserialized back into IEEE-754 double-precision floating point values again. We found this to be sufficiently portable for our needs, although at least in theory there may be issues with some systems. Here is what we used as some sort of backing for our assumptions (from https://en.wikipedia.org/wiki/Endianness#Floating_point):

It may therefore appear strange that the widespread IEEE 754 floating-point standard does not specify endianness.[17] Theoretically, this means that even standard IEEE floating-point data written by one machine might not be readable by another. However, on modern standard computers (i.e., implementing IEEE 754), one may in practice safely assume that the endianness is the same for floating-point numbers as for integers, making the conversion straightforward regardless of data type.

I can't tell whether this assumption holds true for ARM processors or other non x86 processors. Right now we only build for x86_64. ARM coverage will hopefully be added within the next 12 months.

I should also add that the CI currently only covers 64 bit builds, as we only offer ArangoDB in 64 bit versions and are not developing actively on 32 bit platforms anymore.

dumblob commented 3 years ago

Thanks, this answered my question.

Do you want to add this information to the readme and to documentation?

Btw. ARM support (incl. 32bit ARM) is really necessary - looking forward to ARM coverage :wink:.

jsteemann commented 3 years ago

Yes, good point. I will add this to the README at some point this week.

jsteemann commented 3 years ago

Finally added portability notes to the README/Spec: https://github.com/arangodb/velocypack/blob/main/VelocyPack.md#Portability