Closed peterstangl closed 5 years ago
Oh, shoot. This must have something to do with garbage collection: if I store the instance in a list, all hashs are unique. So apparently when an object is destroyed, its hash can be reused.
What can we do about this? Data-based hashing will be way too slow...
... actually, we could just set the hash on instantiation to be a UUID. This will lead to the behaviour we had expected (even if it will give different hashes for instances with the same values, but this was the case in the original solution as well).
def __hash__(self):
return uuid.uuid4().hex
Yes, this was also what I was thinking about. Unique hashes for different instances is enough to solve the current problem. And as you say, data-based hashing is too slow and this would have been the only option to have same hashes for instances with same values.
def __hash__(self): return uuid.uuid4().hex
But the hash should be saved as private attribute, right? otherwise on each call of __hash__()
, a new id is generated.
Sure, stupid me.
And all hashes in python I have seen so far were 64 bit integers. So perhaps we should also make sure that the uuid is turned into one.
Oh OK. In that case I guess using secrets.randbits(64)
is much easier.
Belay that. secrets
is not available in Python 3.5. random.randint(1, 2**64)
?
Or simply hash(uuid.uuid4())
?
that's 3 times slower on my system
It's actually faster than uuid.uuid4().hex
on my system, but yes, random.randint(1, 2**64)
is 3 times faster
But wait a second... the problem is that the original solution assumed that only wcxf.WC
is practically immutable, but this does not apply to wilson.Wilson
or flavio.WilsonCoefficients
; so if we implement this random thing, it would have to be in the wcxf
package (which would require releases of all 3 codes :unamused: )
If you don't want to make a new release now, the modification could be put in the development versions of wcxf and wilson (and will not affect anything). The caching in flavio should then be moved to a separate branch again and only merged after new versions of wcxf and wilson have been released.
Concerning the hash, it is possible that 32bit python uses 32 bit hashes. To be save and always get a proper hash, one could use hash(random.randint(1, 2**64))
which is as fast as random.randint(1, 2**64)
alone.
btw, even faster (by almost a factor of 10 on my system) is hash(random.random())
This can be closed, no?
Yes
There is an issue with the hash of
WilsonCoefficient
as implemented in https://github.com/flav-io/flavio/commit/48347a13598ba510a2782bf815c8d78315e62206:https://github.com/flav-io/flavio/blob/48347a13598ba510a2782bf815c8d78315e62206/flavio/physics/eft.py#L136-L141
The problem is that different
wcxf.WC
instances based on different Wilson coefficients can have the same hash. This can be demonstrated as follows:e.g. returns
I don't know where is behaviour is coming from, but to use the caching as implemented in https://github.com/flav-io/flavio/commit/48347a13598ba510a2782bf815c8d78315e62206, it will be necessary to make sure that different
wcxf.WC
instances will have different hashes.This also applies to
wilson
, from which, usingi get