Closed btovar closed 4 months ago
The CI is failing because it depends on the unmerged topcoffea pr: https://github.com/TopEFT/topcoffea/pull/6
Attention: Patch coverage is 35.54007%
with 370 lines
in your changes are missing coverage. Please review.
Project coverage is 26.27%. Comparing base (
3ae0f05
) to head (c37a242
).
:umbrella: View full report in Codecov by Sentry.
:loudspeaker: Have feedback on the report? Share it here.
@jlawrenc @bryates @Andrew42 @sscruz Can we validate these changes and get them merged? I've seen some performance numbers that suggest we could reach 30-40 min today at ND, but we need this new histEFT merged to access that. Can we find someone to work on this as a top priority ASAP.
@jlawrenc @bryates @Andrew42 @sscruz Can we validate these changes and get them merged? I've seen some performance numbers that suggest we could reach 30-40 min today at ND, but we need this new histEFT merged to access that. Can we find someone to work on this as a top priority ASAP.
I'm happy with the CI running. I could try to make a pkl
file of the full analysis if we want to do a complete validation.
I can run a full analysis if people want starting now.
I can run a full analysis if people want starting now.
Let's both run it and see if we can at least match the yields from @kmohrman's official pkl
file.
Any updates yet? I see for example that Andrea was planning to do some ratio plotting and the recommendation was to use features of coffea.Hist'/'HistEFT
which we really want to deprecate and remove in favor of scikithep/hist
/new histEFT
. Until we get this merged and officially switch over, people are going to be wasting their time writing code that will have to be redone after the switch. It would be good if we could get this tested out ASAP!
@jlawrenc @bryates consider that, if it is a matter of running a full analysis and check for errors, I'm available to do that if both of you cannot
I have a "baseline" pkl file produced from master
and I'm currently running the processor with @btovar's branch. I plan to run both pkl files through the datacard making tools and compare the outputs. If the differences are negligible I think we can call this PR good and get it merged. Thoughts?
@klannon I ran the processor on this and uses get_json_yields.py
, but when I compare it to a25
from @kmohrman I get some noticable fractions of changes. @jlawrenc is also running, and if @Andrew42 can run the datacard maker instead, let's set what that gives.
@btovar @klannon there might be an issue with how the new histEFT
is adding histograms together. I see some discrepancies between the yields in our paper, and running on this branch. We are also seeing an issue in the non-prompt background subtraction where at least 2018 has no non-prompt leptons.
@bryates My guess is that the new histEFT is fine, but that the changes in dataDrivenEstimation.py or datacard_tools.py are not. Since the current pr does pass the yield tests, do you know which cases are not being covered by the tests?
One big difference is that the new scikit hist puts NaN and overflow in the same bin. Could the differences that you see be related to that?
@bryates My guess is that the new histEFT is fine, but that the changes in dataDrivenEstimation.py or datacard_tools.py are not. Since the current pr does pass the yield tests, do you know which cases are not being covered by the tests?
One big difference is that the new scikit hist puts NaN and overflow in the same bin. Could the differences that you see be related to that?
Thanks @btovar, I forgot those scripts were updated as well. Maybe @Andrew42 can test the datacard maker (I think he already is).
It would be helpful if we could compare the histogram output after the final accumulation, but before all the post-processing. (It will also save time if we can work with a histogram that we feed to the post-processing rather than rerun the whole workflow every time.)
Just had a random thought on this (sorry if it's not relevant). But I think someone (@Andrew42?) mentioned the issue that is being investigated is with the non prompt contribution.
Do you see this discrepancy after running the non prompt estimation post processing script? If so, then I wonder if the issue could be related to the combining of non-EFT histEFTs with EFT histEFTs (i.e. histograms for processes that don't have any EFT coefficients with those that do). The non prompt contribution involves combining these types of contributions together (when doing the prompt subtraction). And if I recall correctly, this might be handled differently in @btovar's new histEFT compared to how it was handled in the old HistEFT.
Anyway, maybe this is not the issue, but just wanted to mention it just in case.
@kmohrman So, without coefficients declared, the new histEFT assumes one axis for "sm". I think that this what effectively coffea hist was doing.
Just to update the discussion here, I've tracked all the "issues" down the variable binning. Once I updated the comparison script to handle this, I saw full agreement. The only remaining issue was whether the non-prompt subtraction script is done before the rebinning (master branch) or after (this branch). I would like to very this does not have a large impact on our final limits. Otherwise, this branch should be good to merge.
That's great! I was worried it was something more fundamental.
Looking through some of the fixes to the datacard maker, do we really want to have this fail silently?
Looking through some of the fixes to the datacard maker, do we really want to have this fail silently?
We might actually be able to revert that. I think it was related to the .values()
change I made in my other PR. Would you mind debugging this? Otherwise, I can try it in a bit.
We might actually be able to revert that. I think it was related to the
.values()
change I made in my other PR. Would you mind debugging this? Otherwise, I can try it in a bit.
ok, I'll take a look and see if this is still and issue
After updating my topcoffea
repo to include @bryates PR that fixes histogram add, the code runes w/o error when I use:
h[sp_key] += h[k]
but throws a KeyError
when I use:
h[sp_key] += h._dense_hists[k]
However, this latter case is what is currently implemented and furthermore, it errors on every instance of entering that try...except
block. So I think we certainly don't want to go with the current implementation.
I'm still trying to figure out if the new code is doing the group
stuff correctly. I think the pkl file you pointed me to (newHist_systs_np.pkl.gz
) is missing some samples, e.g. I noticed for 2lss
channels it only had ttllJet_privateUL16APV
and ttllJet_privateUL16
, but not 2017
or 2018
. I think you had mentioned seeing strange issues relating to the years when you were doing your checks, so I assume this might be related.
After updating my
topcoffea
repo to include @bryates PR that fixes histogram add, the code runes w/o error when I use:h[sp_key] += h[k]
but throws a
KeyError
when I use:h[sp_key] += h._dense_hists[k]
However, this latter case is what is currently implemented and furthermore, it errors on every instance of entering that
try...except
block. So I think we certainly don't want to go with the current implementation.I'm still trying to figure out if the new code is doing the
group
stuff correctly. I think the pkl file you pointed me to (newHist_systs_np.pkl.gz
) is missing some samples, e.g. I noticed for2lss
channels it only hadttllJet_privateUL16APV
andttllJet_privateUL16
, but not2017
or2018
. I think you had mentioned seeing strange issues relating to the years when you were doing your checks, so I assume this might be related.
Thanks for the update. I agree we should use
h[sp_key] += h[k]
now that the histEFT
PR fixed that issues. Let's try and debug the group
function some more. I had a suspicion about it as well, but only checked in the datacard maker. If the pkl
file is missing items, that could be something else. I don't think the processor uses group
.
I had trouble replicating what the old group
did. In particular, the behavior of grouping by some_axis
or [some_axis]
seem different. Both would integrate, but one would seem preserve the axis in the final histogram. It is possible that I removed or added a [...]
in the conversion.
Major changes:
h._sumw[...]
toh.view(as_dict=True, flow=True)[...]
len(h.values()) == 0
created a copy of the values just to check if the histogram was empty. Using nowh.empty()
Supersedes #371