TopEFT / topeft

15 stars 24 forks source link

[WIP] Fix card bin clipping #331

Open kmohrman opened 1 year ago

kmohrman commented 1 year ago

Opening a PR for this branch created by @Andrew42 a few weeks ago. The main purpose of the PR is to enable easier comparison against master (not sure if we will eventually want to merge this or not).

kmohrman commented 1 year ago

@Andrew42 I am wondering if you intended for the changes on this branch (which were implemented I think to avoid some t2w crash) to be eventually merged into the master? If I'm remembering correctly, I think we agreed not to merge it right away (since the way the solution was implemented would lead to very tiny numerical differences in the histogram contents) but I am wondering if you think it would be useful to merge this after TOP-22-006 is out the door (in which case we should leave the PR open), or if you intended these changes to just be temporary workaround (in which case we could close the PR).

kmohrman commented 1 year ago

@Andrew42 I am wondering if you intended for the changes on this branch (which were implemented I think to avoid some t2w crash) to be eventually merged into the master? If I'm remembering correctly, I think we agreed not to merge it right away (since the way the solution was implemented would lead to very tiny numerical differences in the histogram contents) but I am wondering if you think it would be useful to merge this after TOP-22-006 is out the door (in which case we should leave the PR open), or if you intended these changes to just be temporary workaround (in which case we could close the PR).

@Andrew42 just wondering if you've had a chance to think about this.

Andrew42 commented 1 year ago

After discussing this with @kmohrman, I think the changes here are useful to have as they dealt with the edge case where all bins in category/histogram were zero, which makes t2w upset.

The core of the changes made it so that when we crop negative bins, we don't crop them to zero, but rather to some small non-negative value. This is why these changes impacted the final yields, but by a very small/inconsequential amount even for the cases where you weren't get the t2w crash, since you could still have some clipped bins in a normal running that would now be some very small, but non-zero value.

It also added a number of print statements to warn/check when the clipping is being done to bins with a 'large' negative yield, which is probably something we want the user to be notified of.

kmohrman commented 1 year ago

Sounds good. I think we should go ahead and merge this into the technical_improvements branch. However since this was branched off of the master, and there have been many changes to technical_improvements that would not be present in this branch, @Andrew42 I'm wondering if you could git merge technical_improvements into this branch before we merge the PR?

We should also understand why the CI is failing.

codecov[bot] commented 1 year ago

Codecov Report

Merging #331 (5c7b22c) into master (02e32cb) will increase coverage by 3.72%. The diff coverage is 50.86%.

:exclamation: Current head 5c7b22c differs from pull request most recent head 93e3d03. Consider uploading reports for the commit 93e3d03 to get more accurate results

@@            Coverage Diff             @@
##           master     #331      +/-   ##
==========================================
+ Coverage   30.27%   33.99%   +3.72%     
==========================================
  Files          47       38       -9     
  Lines        7603     6459    -1144     
==========================================
- Hits         2302     2196     -106     
+ Misses       5301     4263    -1038     
Flag Coverage Δ
unittests 33.99% <50.86%> (+3.72%) :arrow_up:

Flags with carried forward coverage won't be shown. Click here to find out more.

Impacted Files Coverage Δ
analysis/topEFT/get_datacard_yields.py 0.00% <0.00%> (ø)
analysis/topEFT/get_yield_json.py 91.66% <ø> (ø)
.../topEFT/make_1d_quad_plots_from_template_histos.py 0.00% <0.00%> (ø)
analysis/topEFT/make_cr_and_sr_plots.py 0.00% <0.00%> (ø)
analysis/topEFT/make_jsons.py 0.00% <0.00%> (ø)
analysis/topEFT/make_skim_jsons.py 0.00% <0.00%> (ø)
analysis/topEFT/missing_parton.py 0.00% <0.00%> (ø)
analysis/topEFT/parse_datacard_templtes.py 0.00% <0.00%> (ø)
analysis/topEFT/run_sow.py 0.00% <0.00%> (ø)
analysis/topEFT/sow_processor.py 0.00% <ø> (ø)
... and 18 more

... and 4 files with indirect coverage changes

:mega: We’re building smart automated test selection to slash your CI/CD build times. Learn more

kmohrman commented 1 year ago

@Andrew42 do you understand why the CI is failing? It seems to be on the comparison against the ref data card. I would have thought that any differences introduced by the changes in this branch would be smaller than the threshold used in the comparison ( 0.5e-5).

kmohrman commented 1 year ago

@Andrew42 what are you intentions with this PR?