Closed wolfv closed 9 months ago
I tried to implement one of the simple hashing algorithms from the WWW but unfortunately it's a bit tricky in Python because it doesn't have integer overflow :) With numpy it's simple, but I don't think anyone would want numpy as dependency for that.
Actually, this code seems to work:
# https://stackoverflow.com/a/14246007
def to_system_integer(value, bits, signed):
base = 1 << bits
value %= base
return value - base if signed and value.bit_length() == bits else value
# from https://gist.github.com/amakukha/7854a3e910cb5866b53bf4b2af1af968
def hash_fnv1a_32(s):
hash = 0x811c9dc5
for x in s:
hash = (ord(x) ^ hash) * 0x01000193
hash = to_system_integer(hash, 64, 1)
return hash
Is this issue distinct from gh-181?
Hi there, thank you for your contribution!
This issue has been automatically marked as stale because it has not had recent activity. It will be closed automatically if no further activity occurs.
If you would like this issue to remain open please:
NOTE: If this issue was closed prematurely, please leave a comment.
Thanks!
Checklist
What happened?
I've looked a bit into properly (byte-for-byte) reproducing the compressed artifacts in mamba and noticed that for the file-sorting the builtin
hash
function is used.I think that's not a great choice because it makes the tarball less reproducible (e.g. from other programming languages, but also across different python versions). E.g. Python 4.x might decide to use a different string hashing algorithm.
I would propose to use some easy-to-implement string hashing algorithm instead (e.g.
djb2
: http://www.cse.yorku.ca/~oz/hash.html) or do away with it for sorting.Conda Info
No response
Conda Config
No response
Conda list
No response
Additional Context
No response