SUEPPhysics / SUEPCoffea_dask

SUEP analysis using coffea with fastjet. Uses Dask for batch submissions
3 stars 13 forks source link

Plot refactoring #279

Closed lucalavezzo closed 8 months ago

lucalavezzo commented 9 months ago

Changes:

Affected Code: Notebooks and scripts should experience no functional changes, all diffs are new features, naming changes, or modularization. There are two things that have changed more than others, but most of the changes are a modularization rather than a rewriting.

1. loader() loader() is still backwards compatible for now, but I would like to eventually drop that support. The only changes that users might need to make are to the options of loader() since the the default flags have changed, but hopefully this should not be very disruptive. The loading in general can now be simplified thanks to the new features of loader. After running the new make_hists.py, which generates metadata, instead of the previous laborious process:

offline_files_2018 = getHistLists(plotDir, 'unblind', '../filelist/Offline/list_2018_JetHT_A02_offline.txt')
offline_files_2017 = getHistLists(plotDir, 'unblind', '../filelist/Offline/list_2017_JetHT_A02_offline.txt')
offline_files_2016 = getHistLists(plotDir, 'unblind', '../filelist/Offline/list_2016_JetHT_A02_offline.txt')
offline_files_SUEP_2018 = getHistLists(plotDir, 'approval', '../filelist/Offline/list_full_signal_offline.txt')

plots_SUEP_2018 = loader(offline_files_SUEP_2018, year='2018')
plots_2018 = loader(offline_files_2018, auto_lumi=True)
plots_2017 = loader(offline_files_2017, auto_lumi=True)
plots_2016 = loader(offline_files_2016, auto_lumi=True)

plots = {}

def applyNormalizationToSUEPSamples(files, plots_SUEP):   
    output = {}
    for file, sample in zip(files, plots_SUEP.keys()):
        file = file.split("/")[-1].split("13TeV")[0]+'13TeV-pythia8'
        xsection = fill_utils.getXSection(file, 'SUEP')
        output[sample] = fill_utils.apply_normalization(plots_SUEP[sample].copy(), xsection)
    return output

plots_SUEP_2018 = applyNormalizationToSUEPSamples(offline_files_SUEP_2018, plots_SUEP_2018)
for key in plots_SUEP_2018.keys(): plots[key+"_2018"] = plots_SUEP_2018[key].copy()
for key in plots_2018.keys(): plots[key+"_2018"] = plots_2018[key].copy()
for key in plots_2017.keys(): plots[key+"_2017"] = plots_2017[key].copy()
for key in plots_2016.keys(): plots[key+"_2016"] = plots_2016[key].copy()

combineYears(plots, 'data', ['2016', '2017', '2018'])

Now one can simply do

offline_files_2018 = getHistLists(plotDir, 'unblind', '../filelist/Offline/list_2018_JetHT_A02_offline.txt')
offline_files_2017 = getHistLists(plotDir, 'unblind', '../filelist/Offline/list_2017_JetHT_A02_offline.txt')
offline_files_2016 = getHistLists(plotDir, 'unblind', '../filelist/Offline/list_2016_JetHT_A02_offline.txt')
offline_files_SUEP_2018 = getHistLists(plotDir, 'approval', '../filelist/Offline/list_full_signal_offline.txt')
all_files = offline_files_2018  + offline_files_2017 + offline_files_2016 + offline_files_SUEP_2018 

plots = loader(all_files, by_bin=True, by_year=True)

2. make_hists.py (previously make_plots.py) Since this has been broken up into make_hists.py and hist_defs.py, and some things cleaned up or moved around, I wanted to be sure that I didn't break anything.

fNew = uproot.open("/data/submit/lavezzo/SUEP/outputs//QCD_HT700to1000_TuneCP5_PSWeights_13TeV-madgraph-pythia8+RunIISummer20UL18MiniAODv2-106X_upgrade2018_realistic_v16_L1v1-v2+MINIAODSIM_example_upgrade2.root")
fOld = uproot.open("/data/submit/lavezzo/SUEP/outputs//QCD_HT700to1000_TuneCP5_PSWeights_13TeV-madgraph-pythia8+RunIISummer20UL18MiniAODv2-106X_upgrade2018_realistic_v16_L1v1-v2+MINIAODSIM_example_old.root")

for k in fOld.keys():
    if 'MET' in k: continue
    allGood = False
    for j in fNew.keys():
        if k == j:
            allGood = True
            if 'metadata' in k: continue
            h1 = fOld[k].to_hist()
            h2 = fNew[j].to_hist()
            if h1 != h2:
                print("Different histograms: ", k)
            break
    if not allGood:
        print("Missing key: ", k)

which shows us that the only differences are the fact that I removed the ABCDvars histogram (unused anywhere, I'm not very sure it was even correct anymore). All previously existing histograms still exist, and are identical in the new code.