Closed peterstangl closed 5 years ago
That's a brilliant idea!
Do I understand correctly that the idea is that the cost of hashing is negligible since it is done only once and is small compared to the calculation of all observables? Did you check how long it roughly takes (just to understand if it would affect e.g. a MeasurementLikelihood
with just a single, very fast observable)?
We should also add a comment to the doc string about the existence of caching and a comment in the code as to what the line defining predictions_key
is doing.
Concerning the hashing, I see a potential problem with hashing wc_obj
. The problem is that, for historical reasons, WilsonCoefficients
was initialized empty and WCs were set afterwards with set_initial
. But the default hash
of a user class will stay the same over the lifetime of the object, so it won't change when the WCs change.
For Wilson
, this would be less of an issue, as WCs are set on instantiation. Nevertheless, it would be better to have a custom __hash__
that also looks at the config dictionary etc. Since this is probably not too relevant for wilson
, this could be done aat the level of WilsonCoefficients
. The questions is how to get a fast hash. I will play around a bit.
Do I understand correctly that the idea is that the cost of hashing is negligible since it is done only once and is small compared to the calculation of all observables? Did you check how long it roughly takes (just to understand if it would affect e.g. a
MeasurementLikelihood
with just a single, very fast observable)?
The hashing takes around 30 μs on my laptop. How fast is the fastest observable? Probably still some orders of magnitude slower?
We should also add a comment to the doc string about the existence of caching and a comment in the code as to what the line defining
predictions_key
is doing.
OK, I will add some comments.
Concerning the hashing, I see a potential problem with hashing
wc_obj
. The problem is that, for historical reasons,WilsonCoefficients
was initialized empty and WCs were set afterwards withset_initial
. But the defaulthash
of a user class will stay the same over the lifetime of the object, so it won't change when the WCs change.For
Wilson
, this would be less of an issue, as WCs are set on instantiation. Nevertheless, it would be better to have a custom__hash__
that also looks at the config dictionary etc. Since this is probably not too relevant forwilson
, this could be done aat the level ofWilsonCoefficients
. The questions is how to get a fast hash. I will play around a bit.
It might be good to actually recompute the hash of a wc_obj
each time the WCs are changed and save this hash in the wc_obj
such that it can be retrieved very fast. It would be nice, if the hash is constructed only from the defining wc_dict
, the scale
and the basis
with
hash((frozenset(wc.dict.items()),wc.basis,wc.scale))
Using such a hash, the hash for two different wc_obj
would be the same if they describe the same point in an EFT basis.
So for me, hashing the par_dict
takes about 60 µs on my machine, hashing a general wcxf.WC.dict
about 230 µs. But here are some fast observables that only take about O(10 µs). However I guess it is OK to assume wcxf.WC
instances to remain unchanged during their lifetime.
OK so in conclusion I think we can merge your PR already, with only
wc_obj.__hash__()
(this is a private method) → hash(wc_obj)
I can then separaretly implement WilsonCoefficients.__hash__
. Assuming wcxf.WC
to be immutable, this would essentially amount to
hash((self.wc, frozenset(self._options)))
if I am not mistaken.
... actually since this is just 3 lines, you can just add this to your PR. in WilsonCoefficients
:
def __hash__(self):
"""Return a hash of the `WilsonCoefficient` instance.
This assumes that `self.wc` is not modified over its lifetime. The hash only changes when options are modified."""
hash((self.wc, frozenset(self._options)))
Sorry, this docstring is misleading, as the attribute self.wc
can indeed change, just not the instances themselves. Better:
"""Return a hash of the `WilsonCoefficient` instance.
The hash changes when Wilson coefficient values or options are modified.
It assumes that `wcxf.WC` instances are not modified after instantiation."""
I realized that this solution as of now has a memory leak: it caches all calls, which will quickly eat up all memory in a scan.
So either we need a LRU cache or, much simpler, just cache the last value called. This should be sufficient for our use case.
... actually since this is just 3 lines, you can just add this to your PR. in
WilsonCoefficients
:def __hash__(self): """Return a hash of the `WilsonCoefficient` instance. This assumes that `self.wc` is not modified over its lifetime. The hash only changes when options are modified.""" hash((self.wc, frozenset(self._options)))
This actually does not work since self._options
is not set on initialization. Should I just set it to None
on init?
Actually, None
does also not work, but I could set it to {}
.
@DavidMStraub I think this is ready to be merged.
This PR implements caching for predictions of observables inside a
MeasurementLikelihood
instance.