Plot refactoring - Githubissues

Changes:

Improved naming scheme
- make_plots.py --> make_hists.py
- merge_plots.py --> merge_ntuples.py
- ggF offline AN and paper notebooks renamed to include offline_ in the name
- submit.py and READMEs modified to reflect naming scheme changes
improved loader() of hists for plotting in plot_utils.py, which cleans up a lot of the plotting (see example below)
plot.ipynb now contains an example of the latest plotting changes, and is meant to be used as an example: each analysis should have their own sets of notebooks/scripts for plotting
added metadata to the histogram .root files that make_hists.py produces. This metadata is now by default used by the loader() to apply lumis.
removed some old plotting scripts
- multithread.py: this has been superseded by submit.py
- ExtendedABCD.ipynb: this is done in plot.ipynb as an example, the code in here was outdated by several months
- quick_plot.ipynb: this was very outdated and broken and served no real purpose
SUEP xsecs are now in each of the years, removed xsections_SUEP.json
make_hists.py has been cleaned up. Key points:
- adapted to run for different channels
- histograms are now defined in hists_def.py
- the parser is defined through a function so we can call it from submit.py (caused issue before when one script was changed and not the other). Added some options to the parser
- track killing and JECs are now treated as the other systematics. Before we were treating them in a separate way that didn't make much sense
- new function to make new variables on the fly from existing dataframe columns
updated plotting/README.md to reflect the above changes and better document what needs to be configured if someone just wants to make histograms, versus if someone wants to modify the code.

Affected Code: Notebooks and scripts should experience no functional changes, all diffs are new features, naming changes, or modularization. There are two things that have changed more than others, but most of the changes are a modularization rather than a rewriting.

1. loader() loader() is still backwards compatible for now, but I would like to eventually drop that support. The only changes that users might need to make are to the options of loader() since the the default flags have changed, but hopefully this should not be very disruptive. The loading in general can now be simplified thanks to the new features of loader. After running the new make_hists.py, which generates metadata, instead of the previous laborious process:

offline_files_2018 = getHistLists(plotDir, 'unblind', '../filelist/Offline/list_2018_JetHT_A02_offline.txt')
offline_files_2017 = getHistLists(plotDir, 'unblind', '../filelist/Offline/list_2017_JetHT_A02_offline.txt')
offline_files_2016 = getHistLists(plotDir, 'unblind', '../filelist/Offline/list_2016_JetHT_A02_offline.txt')
offline_files_SUEP_2018 = getHistLists(plotDir, 'approval', '../filelist/Offline/list_full_signal_offline.txt')

plots_SUEP_2018 = loader(offline_files_SUEP_2018, year='2018')
plots_2018 = loader(offline_files_2018, auto_lumi=True)
plots_2017 = loader(offline_files_2017, auto_lumi=True)
plots_2016 = loader(offline_files_2016, auto_lumi=True)

plots = {}

def applyNormalizationToSUEPSamples(files, plots_SUEP):   
    output = {}
    for file, sample in zip(files, plots_SUEP.keys()):
        file = file.split("/")[-1].split("13TeV")[0]+'13TeV-pythia8'
        xsection = fill_utils.getXSection(file, 'SUEP')
        output[sample] = fill_utils.apply_normalization(plots_SUEP[sample].copy(), xsection)
    return output

plots_SUEP_2018 = applyNormalizationToSUEPSamples(offline_files_SUEP_2018, plots_SUEP_2018)
for key in plots_SUEP_2018.keys(): plots[key+"_2018"] = plots_SUEP_2018[key].copy()
for key in plots_2018.keys(): plots[key+"_2018"] = plots_2018[key].copy()
for key in plots_2017.keys(): plots[key+"_2017"] = plots_2017[key].copy()
for key in plots_2016.keys(): plots[key+"_2016"] = plots_2016[key].copy()

combineYears(plots, 'data', ['2016', '2017', '2018'])

Now one can simply do

offline_files_2018 = getHistLists(plotDir, 'unblind', '../filelist/Offline/list_2018_JetHT_A02_offline.txt')
offline_files_2017 = getHistLists(plotDir, 'unblind', '../filelist/Offline/list_2017_JetHT_A02_offline.txt')
offline_files_2016 = getHistLists(plotDir, 'unblind', '../filelist/Offline/list_2016_JetHT_A02_offline.txt')
offline_files_SUEP_2018 = getHistLists(plotDir, 'approval', '../filelist/Offline/list_full_signal_offline.txt')
all_files = offline_files_2018  + offline_files_2017 + offline_files_2016 + offline_files_SUEP_2018 

plots = loader(all_files, by_bin=True, by_year=True)

2. make_hists.py (previously make_plots.py) Since this has been broken up into make_hists.py and hist_defs.py, and some things cleaned up or moved around, I wanted to be sure that I didn't break anything.

fNew = uproot.open("/data/submit/lavezzo/SUEP/outputs//QCD_HT700to1000_TuneCP5_PSWeights_13TeV-madgraph-pythia8+RunIISummer20UL18MiniAODv2-106X_upgrade2018_realistic_v16_L1v1-v2+MINIAODSIM_example_upgrade2.root")
fOld = uproot.open("/data/submit/lavezzo/SUEP/outputs//QCD_HT700to1000_TuneCP5_PSWeights_13TeV-madgraph-pythia8+RunIISummer20UL18MiniAODv2-106X_upgrade2018_realistic_v16_L1v1-v2+MINIAODSIM_example_old.root")

for k in fOld.keys():
    if 'MET' in k: continue
    allGood = False
    for j in fNew.keys():
        if k == j:
            allGood = True
            if 'metadata' in k: continue
            h1 = fOld[k].to_hist()
            h2 = fNew[j].to_hist()
            if h1 != h2:
                print("Different histograms: ", k)
            break
    if not allGood:
        print("Missing key: ", k)

which shows us that the only differences are the fact that I removed the ABCDvars histogram (unused anywhere, I'm not very sure it was even correct anymore). All previously existing histograms still exist, and are identical in the new code.

SUEPPhysics / SUEPCoffea_dask

Plot refactoring #279