TopEFT / topeft

15 stars 24 forks source link

Updates topeft to topcoffea.modules.histEFT #384

Closed btovar closed 4 months ago

btovar commented 1 year ago

Major changes:

Supersedes #371

btovar commented 1 year ago

The CI is failing because it depends on the unmerged topcoffea pr: https://github.com/TopEFT/topcoffea/pull/6

codecov[bot] commented 1 year ago

Codecov Report

Attention: Patch coverage is 35.54007% with 370 lines in your changes are missing coverage. Please review.

Project coverage is 26.27%. Comparing base (3ae0f05) to head (c37a242).

Files Patch % Lines
analysis/topeft_run2/comp.py 0.00% 257 Missing :warning:
topeft/modules/datacard_tools.py 66.12% 42 Missing :warning:
analysis/topeft_run2/sow_processor.py 0.00% 19 Missing :warning:
topeft/modules/get_renormfact_envelope.py 0.00% 18 Missing :warning:
topeft/modules/comp_datacard.py 76.27% 14 Missing :warning:
topeft/modules/dataDrivenEstimation.py 64.70% 12 Missing :warning:
analysis/topeft_run2/analysis_processor.py 80.00% 5 Missing :warning:
topeft/modules/yield_tools.py 92.00% 2 Missing :warning:
analysis/topeft_run2/update_json_sow.py 0.00% 1 Missing :warning:
Additional details and impacted files ```diff @@ Coverage Diff @@ ## master #384 +/- ## ========================================== - Coverage 27.17% 26.27% -0.90% ========================================== Files 28 31 +3 Lines 4225 4552 +327 ========================================== + Hits 1148 1196 +48 - Misses 3077 3356 +279 ``` | [Flag](https://app.codecov.io/gh/TopEFT/topeft/pull/384/flags?src=pr&el=flags&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=TopEFT) | Coverage Δ | | |---|---|---| | [unittests](https://app.codecov.io/gh/TopEFT/topeft/pull/384/flags?src=pr&el=flag&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=TopEFT) | `26.27% <35.54%> (-0.90%)` | :arrow_down: | Flags with carried forward coverage won't be shown. [Click here](https://docs.codecov.io/docs/carryforward-flags?utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=TopEFT#carryforward-flags-in-the-pull-request-comment) to find out more.

:umbrella: View full report in Codecov by Sentry.
:loudspeaker: Have feedback on the report? Share it here.

klannon commented 1 year ago

@jlawrenc @bryates @Andrew42 @sscruz Can we validate these changes and get them merged? I've seen some performance numbers that suggest we could reach 30-40 min today at ND, but we need this new histEFT merged to access that. Can we find someone to work on this as a top priority ASAP.

bryates commented 1 year ago

@jlawrenc @bryates @Andrew42 @sscruz Can we validate these changes and get them merged? I've seen some performance numbers that suggest we could reach 30-40 min today at ND, but we need this new histEFT merged to access that. Can we find someone to work on this as a top priority ASAP.

I'm happy with the CI running. I could try to make a pkl file of the full analysis if we want to do a complete validation.

jlawrenc commented 1 year ago

I can run a full analysis if people want starting now.

bryates commented 1 year ago

I can run a full analysis if people want starting now.

Let's both run it and see if we can at least match the yields from @kmohrman's official pkl file.

klannon commented 1 year ago

Any updates yet? I see for example that Andrea was planning to do some ratio plotting and the recommendation was to use features of coffea.Hist'/'HistEFT which we really want to deprecate and remove in favor of scikithep/hist/new histEFT. Until we get this merged and officially switch over, people are going to be wasting their time writing code that will have to be redone after the switch. It would be good if we could get this tested out ASAP!

anpicci commented 1 year ago

@jlawrenc @bryates consider that, if it is a matter of running a full analysis and check for errors, I'm available to do that if both of you cannot

Andrew42 commented 1 year ago

I have a "baseline" pkl file produced from master and I'm currently running the processor with @btovar's branch. I plan to run both pkl files through the datacard making tools and compare the outputs. If the differences are negligible I think we can call this PR good and get it merged. Thoughts?

bryates commented 1 year ago

@klannon I ran the processor on this and uses get_json_yields.py, but when I compare it to a25 from @kmohrman I get some noticable fractions of changes. @jlawrenc is also running, and if @Andrew42 can run the datacard maker instead, let's set what that gives.

bryates commented 1 year ago

@btovar @klannon there might be an issue with how the new histEFT is adding histograms together. I see some discrepancies between the yields in our paper, and running on this branch. We are also seeing an issue in the non-prompt background subtraction where at least 2018 has no non-prompt leptons.

btovar commented 12 months ago

@bryates My guess is that the new histEFT is fine, but that the changes in dataDrivenEstimation.py or datacard_tools.py are not. Since the current pr does pass the yield tests, do you know which cases are not being covered by the tests?

One big difference is that the new scikit hist puts NaN and overflow in the same bin. Could the differences that you see be related to that?

bryates commented 12 months ago

@bryates My guess is that the new histEFT is fine, but that the changes in dataDrivenEstimation.py or datacard_tools.py are not. Since the current pr does pass the yield tests, do you know which cases are not being covered by the tests?

One big difference is that the new scikit hist puts NaN and overflow in the same bin. Could the differences that you see be related to that?

Thanks @btovar, I forgot those scripts were updated as well. Maybe @Andrew42 can test the datacard maker (I think he already is).

btovar commented 12 months ago

It would be helpful if we could compare the histogram output after the final accumulation, but before all the post-processing. (It will also save time if we can work with a histogram that we feed to the post-processing rather than rerun the whole workflow every time.)

kmohrman commented 12 months ago

Just had a random thought on this (sorry if it's not relevant). But I think someone (@Andrew42?) mentioned the issue that is being investigated is with the non prompt contribution.

Do you see this discrepancy after running the non prompt estimation post processing script? If so, then I wonder if the issue could be related to the combining of non-EFT histEFTs with EFT histEFTs (i.e. histograms for processes that don't have any EFT coefficients with those that do). The non prompt contribution involves combining these types of contributions together (when doing the prompt subtraction). And if I recall correctly, this might be handled differently in @btovar's new histEFT compared to how it was handled in the old HistEFT.

Anyway, maybe this is not the issue, but just wanted to mention it just in case.

btovar commented 12 months ago

@kmohrman So, without coefficients declared, the new histEFT assumes one axis for "sm". I think that this what effectively coffea hist was doing.

bryates commented 11 months ago

Just to update the discussion here, I've tracked all the "issues" down the variable binning. Once I updated the comparison script to handle this, I saw full agreement. The only remaining issue was whether the non-prompt subtraction script is done before the rebinning (master branch) or after (this branch). I would like to very this does not have a large impact on our final limits. Otherwise, this branch should be good to merge.

btovar commented 11 months ago

That's great! I was worried it was something more fundamental.

Andrew42 commented 10 months ago

Looking through some of the fixes to the datacard maker, do we really want to have this fail silently?

bryates commented 10 months ago

Looking through some of the fixes to the datacard maker, do we really want to have this fail silently?

We might actually be able to revert that. I think it was related to the .values() change I made in my other PR. Would you mind debugging this? Otherwise, I can try it in a bit.

Andrew42 commented 10 months ago

We might actually be able to revert that. I think it was related to the .values() change I made in my other PR. Would you mind debugging this? Otherwise, I can try it in a bit.

ok, I'll take a look and see if this is still and issue

Andrew42 commented 10 months ago

After updating my topcoffea repo to include @bryates PR that fixes histogram add, the code runes w/o error when I use:

h[sp_key] += h[k]

but throws a KeyError when I use:

h[sp_key] += h._dense_hists[k]

However, this latter case is what is currently implemented and furthermore, it errors on every instance of entering that try...except block. So I think we certainly don't want to go with the current implementation.

I'm still trying to figure out if the new code is doing the group stuff correctly. I think the pkl file you pointed me to (newHist_systs_np.pkl.gz) is missing some samples, e.g. I noticed for 2lss channels it only had ttllJet_privateUL16APV and ttllJet_privateUL16, but not 2017 or 2018. I think you had mentioned seeing strange issues relating to the years when you were doing your checks, so I assume this might be related.

bryates commented 10 months ago

After updating my topcoffea repo to include @bryates PR that fixes histogram add, the code runes w/o error when I use:

h[sp_key] += h[k]

but throws a KeyError when I use:

h[sp_key] += h._dense_hists[k]

However, this latter case is what is currently implemented and furthermore, it errors on every instance of entering that try...except block. So I think we certainly don't want to go with the current implementation.

I'm still trying to figure out if the new code is doing the group stuff correctly. I think the pkl file you pointed me to (newHist_systs_np.pkl.gz) is missing some samples, e.g. I noticed for 2lss channels it only had ttllJet_privateUL16APV and ttllJet_privateUL16, but not 2017 or 2018. I think you had mentioned seeing strange issues relating to the years when you were doing your checks, so I assume this might be related.

Thanks for the update. I agree we should use

h[sp_key] += h[k]

now that the histEFT PR fixed that issues. Let's try and debug the group function some more. I had a suspicion about it as well, but only checked in the datacard maker. If the pkl file is missing items, that could be something else. I don't think the processor uses group.

btovar commented 10 months ago

I had trouble replicating what the old group did. In particular, the behavior of grouping by some_axis or [some_axis] seem different. Both would integrate, but one would seem preserve the axis in the final histogram. It is possible that I removed or added a [...] in the conversion.