andrewchambers / hermes

Hermes software environment manager
MIT License
312 stars 8 forks source link

Instability in package hashing? #4

Open andrewchambers opened 4 years ago

andrewchambers commented 4 years ago

Hermes relies on a stable hashing algorithm of janet values (within janet releases is fine), here are a few things to consider:

Suppose we load a package graph into memory and immediately hash it using our custom hash function in https://github.com/andrewchambers/hermes/blob/master/src/pkghash.c, currently things like table iteration order depend on the order of contents in our hash table in memory. This is random in nature, but is stable enough for our purposes with a constant siphash seed. Two runs of the same package tree provide a stable package hash.

Now consider we marshal our graph, then unmarshal it. Interestingly the iteration order of our tables will change! Unmarshal does not reinsert table elements in the same order they were originally inserted. The package hash is now a function of the input data structures + the number of times it has been marshalled and unmarshalled!

Lucky for us, the number of marshal/unmarshal round trips is also constant (just once). This seems like a lucky kludge, but it works for our purposes.

Could we do better? Do we even care?

andrewchambers commented 4 years ago

Another problem is when Pointer values (buffers/cfunctions/etc...) are used as keys within structs and tables. We could probably error if this is detected while hashing, though it seems likely to be rare in practice.