TopEFT / topeft

15 stars 24 forks source link

Adding statistical uncertainties #284

Closed sscruz closed 2 years ago

sscruz commented 2 years ago

Adds the option for including uncertainties associated to the statistical uncertainties of backgrounds using the autoMCstats option from combine.

codecov[bot] commented 2 years ago

Codecov Report

Merging #284 (93e039c) into master (ab50f25) will decrease coverage by 2.78%. The diff coverage is 85.36%.

@@            Coverage Diff             @@
##           master     #284      +/-   ##
==========================================
- Coverage   29.39%   26.60%   -2.79%     
==========================================
  Files          42       42              
  Lines        6699     6728      +29     
==========================================
- Hits         1969     1790     -179     
- Misses       4730     4938     +208     
Flag Coverage Δ
unittests 26.60% <85.36%> (-2.79%) :arrow_down:

Flags with carried forward coverage won't be shown. Click here to find out more.

Impacted Files Coverage Δ
topcoffea/modules/dataDrivenEstimation.py 68.90% <80.00%> (-3.13%) :arrow_down:
analysis/topEFT/datacard_maker.py 72.48% <84.21%> (+0.23%) :arrow_up:
topcoffea/modules/HistEFT.py 62.67% <88.23%> (+3.05%) :arrow_up:
topcoffea/modules/corrections.py 0.00% <0.00%> (-35.39%) :arrow_down:
topcoffea/modules/objects.py 0.00% <0.00%> (-20.88%) :arrow_down:
topcoffea/modules/QuadFitTools.py 0.00% <0.00%> (-16.93%) :arrow_down:
topcoffea/modules/selection.py 0.00% <0.00%> (-6.96%) :arrow_down:
topcoffea/modules/GetValuesFromJsons.py 32.14% <0.00%> (-1.79%) :arrow_down:

Continue to review full report at Codecov.

Legend - Click here to learn more Δ = absolute <relative> (impact), ø = not affected, ? = missing data Powered by Codecov. Last update ab50f25...93e039c. Read the comment docs.

kmohrman commented 2 years ago

Ok, looks like all of the questions have been resolved. I'll go ahead and test it out in the full workflow now.

kmohrman commented 2 years ago

Ah, just thought of something, would it make sense to include the update to the list of prompt subtraction samples as part of this PR? We discussed it here and concluded that it should not really make much difference, but it seems like it might be worthwhile to include in this PR since the prompt subtraction happens in the data driven estimator, and this PR is already making updates to that script (though the actual updates for this changes will be in the params.json file).

bryates commented 2 years ago

Ah, just thought of something, would it make sense to include the update to the list of prompt subtraction samples as part of this PR? We discussed it here and concluded that it should not really make much difference, but it seems like it might be worthwhile to include in this PR since the prompt subtraction happens in the data driven estimator, and this PR is already making updates to that script (though the actual updates for this changes will be in the params.json file).

Seems like a good addition to me.

sscruz commented 2 years ago

hi, I just pushed much cleaner implementation. I've implemented a function copy_sm that returns a HistEFT with no WCs that contains the yields at the SM point. sumw2 is hardcoded to zero but this is the behavior I intended at this point

kmohrman commented 2 years ago

Ad discussed on MM here, the effects of including the stat uncertainties is very small (generally <1%, though the largest effect was 2%) and we are ready to merge.

One unexpected effect of running with the auto mc stats is that it apparently results in smaller workspaces (by about 20%) and shorter runtimes (by more than a factor of 2 for the differential case). These effects seem somewhat unintuitive, but we cannot point to anything that is actually wrong, so we have agreed to move ahead.