TopEFT / topeft

15 stars 24 forks source link

Fix statistical uncertainty in datacards and crop negative bins to zero #323

Closed sscruz closed 2 years ago

sscruz commented 2 years ago
codecov[bot] commented 2 years ago

Codecov Report

Merging #323 (0ceefc9) into master (752826c) will decrease coverage by 13.77%. The diff coverage is 87.87%.

@@             Coverage Diff             @@
##           master     #323       +/-   ##
===========================================
- Coverage   37.98%   24.20%   -13.78%     
===========================================
  Files          47       47               
  Lines        7240     7251       +11     
===========================================
- Hits         2750     1755      -995     
- Misses       4490     5496     +1006     
Flag Coverage Δ
unittests 24.20% <87.87%> (-13.78%) :arrow_down:

Flags with carried forward coverage won't be shown. Click here to find out more.

Impacted Files Coverage Δ
topcoffea/modules/datacard_tools.py 71.05% <86.20%> (-0.02%) :arrow_down:
analysis/topEFT/make_cards.py 58.43% <100.00%> (+0.25%) :arrow_up:
topcoffea/modules/selection.py 0.00% <0.00%> (-90.38%) :arrow_down:
topcoffea/modules/corrections.py 0.00% <0.00%> (-85.23%) :arrow_down:
topcoffea/modules/objects.py 0.00% <0.00%> (-80.22%) :arrow_down:
analysis/topEFT/topeft.py 9.40% <0.00%> (-72.87%) :arrow_down:
topcoffea/modules/get_renormfact_envelope.py 0.00% <0.00%> (-13.93%) :arrow_down:
topcoffea/modules/dataDrivenEstimation.py 69.16% <0.00%> (-4.17%) :arrow_down:
topcoffea/modules/GetValuesFromJsons.py 32.14% <0.00%> (-3.58%) :arrow_down:
topcoffea/modules/utils.py 20.97% <0.00%> (-2.80%) :arrow_down:
... and 2 more

:mega: We’re building smart automated test selection to slash your CI/CD build times. Learn more

kmohrman commented 2 years ago

@Andrew42 and @sscruz, I am wondering about the plan regarding these style comments. Do either (or both) of you want to (and have time to) implement these updates as part of this PR? Or is this something that will be pushed back to an issue to be addressed later?

Andrew42 commented 2 years ago

@kmohrman the style comments are pretty simple/minor, so I could do them myself, but as this isn't my PR I don't want to commit changes w/o @sscruz's acknowledgement or feedback

kmohrman commented 2 years ago

Hi @Andrew42 and @sscruz, just wondering what the plan is for addressing the minor style comments? Is that something we want to do as part of this PR? If so, @Andrew42 would you plan to implement them?

Andrew42 commented 2 years ago

@sscruz has implemented the requested changes and I'm fine with having this PR merged

Andrew42 commented 2 years ago

I ran a comparison of the datacard maker outputs from master (f3) and the changes from commit 0ceefc9 (f2) and compared against the outputs used to generate the most recent workspace, which should correspond to this PR prior to commit 2e8de7f (f1):

fpath_1 = "/scratch365/kmohrman/fullR2_files/anatest21v01/ptz-lj0pt_withSys"
fpath_2 = "/scratch365/awightma/datacard_tests/fixNaNuncertainties"
fpath_3 = "/scratch365/awightma/datacard_tests/master_61eb321d"

km = "lj0pt"
ch = "2lss_4t_m_4j"
fname = f"ttx_multileptons-{ch}_{km}.root"

f1 = uproot.open(os.path.join(fpath_1,fname))
f2 = uproot.open(os.path.join(fpath_2,fname))
f3 = uproot.open(os.path.join(fpath_3,fname))

h1 = f1["fakes_sm"]
h2 = f2["fakes_sm"]
h3 = f3["fakes_sm"]

h1.values() # array([1.23129159, 1.50385304, 0.16989791, 0.        ])
h2.values() # array([1.23129159, 1.50385304, 0.16989791, 0.        ])
h3.values() # array([1.23129159, 1.50385304, 0.16989791, -0.03409792])

h1.variances() # array([0.21787236, 0.25944065, 0.04309692, 0.        ])
h2.variances() # array([0.21787236, 0.25944065, 0.04309692, 0.        ])
h3.variances() # array([1.23129159, 1.50385304, 0.16989791, 0.        ])

As expected, the output from master generates bins with negative events, while using the version from this PR the bin is correctly zero'd out. Furthermore, the style changes did not affect the outputs (as expected).

I can run this comparison over more channels if we think it's necessary, but from this I'd say it would be safe to merge

sscruz commented 2 years ago

hi, I ran a similar validation with the script below, getting also no changes in all bins of our datacards, besides the fakes_sm

import ROOT as r
import os

import re
pattern=re.compile('(?P<proc>.*)_quad_mixed_(?P<wc1>.*)_(?P<wc2>.*)')
for l in os.listdir('.'):
    if not l.endswith('.root'): continue
    if 'njet' in l: continue
    if '3l' in l and 'lj0pt' in l: continue
    tf=r.TFile.Open(l)
    alternative_tf=r.TFile.Open('/work/kmohrman/CMSSW_Releases/CMSSW_10_2_13/src/EFTFit/Fitter/test/datacards/fullR2/anatest19/at19v01/ptz-lj0pt_withSys/%s'%l)
    if not alternative_tf:
        continue
    for key in  tf.GetListOfKeys():
        hist = key.ReadObj()
        hist_name=hist.GetName()
        if not alternative_tf.Get(hist_name):
            # try something                                                                                                                                                                                        
            match = pattern.match(hist_name)
            if  match:
                hist_name = '%s_quad_mixed_%s_%s'%(match.group('proc'), match.group('wc2'), match.group('wc1'))

        if not alternative_tf.Get(hist_name):
            print("Histo %s not found in file %s"%(hist_name, l))
            print('the integral is', hist_name)
            continue
        alt_hist=alternative_tf.Get(hist_name)

        for i in range(1,hist.GetNbinsX()+1):
            if abs(alt_hist.GetBinContent(i)-hist.GetBinContent(i)) < 1e-10: continue
            print(hist_name, alt_hist.GetBinContent(i)-hist.GetBinContent(i))
kmohrman commented 2 years ago

Thank you @sscruz and @Andrew42. It looks like all of the comments on this PR have been addressed, and that everything has been double checked by both of you. Also @bryates has approved the PR. So I will go ahead and merge.