SUEPPhysics / SUEPCoffea_dask

SUEP analysis using coffea with fastjet. Uses Dask for batch submissions
3 stars 13 forks source link

adding lepton ID/isolation/dxy vars, TopPT->HighestPT, and tiny typo fixes #276

Closed jreic closed 9 months ago

jreic commented 9 months ago

Hi @pmlugato ,

When you have a chance, take a look and see if this is okay for you to merge into your branch (on the road to then merging back into master). Mostly added the extra lepton branches, which we might clean up in time after we study which lepton definitions we want to use.

And let me know once you go and rerun over all of the samples with these included. I may go run over some of the signals + backgrounds to start studying our optimal lepton choices, but if you end up processing them all sooner, I could just use your hdf5 files rather than duplicating the work.

Thanks! Joey

lucalavezzo commented 9 months ago

We typically don't push changes to the plotting notebook unless something isn't working, it's meant to be a sort of starting point for playing around with the histograms. Are the changes important? (gh doesn't display diff wells for notebooks too)

jreic commented 9 months ago

It should be purely TopPT -> HighestPT in the notebook, which will otherwise fail with new hdf5 files (I think of it as a schema evolution).

lucalavezzo commented 9 months ago

It should be purely TopPT -> HighestPT in the notebook, which will otherwise fail with new hdf5 files (I think of it as a schema evolution).

The notebook is not meant to be specific to any analysis. Perhaps we could completely scrub it of all specific mentions of samples/plots and just leave a sort of "# put samples here" and "# put plot names here". Pietro and I were thinking to improve this a bit, maybe also just grabbing the automatic plotting part and putting it into a script.

chadfreer commented 9 months ago

It should be purely TopPT -> HighestPT in the notebook, which will otherwise fail with new hdf5 files (I think of it as a schema evolution).

The notebook is not meant to be specific to any analysis. Perhaps we could completely scrub it of all specific mentions of samples/plots and just leave a sort of "# put samples here" and "# put plot names here". Pietro and I were thinking to improve this a bit, maybe also just grabbing the automatic plotting part and putting it into a script.

I agree with Joey on this. This naming scheme is confusing. The notebook is only replacing the name. Is there a specific dependency that will break here?

lucalavezzo commented 9 months ago

It should be purely TopPT -> HighestPT in the notebook, which will otherwise fail with new hdf5 files (I think of it as a schema evolution).

The notebook is not meant to be specific to any analysis. Perhaps we could completely scrub it of all specific mentions of samples/plots and just leave a sort of "# put samples here" and "# put plot names here". Pietro and I were thinking to improve this a bit, maybe also just grabbing the automatic plotting part and putting it into a script.

I agree with Joey on this. This naming scheme is confusing. The notebook is only replacing the name. Is there a specific dependency that will break here?

No no, there are no dependencies. I just was thinking about how we want to have this notebook set up in the central repo, in the main branch. We can also have a different notebook for each analysis. All the functions should be in plot_utils.py, the notebook in the github in my opinion should just be a starting place for people to play around with histograms.

jreic commented 9 months ago

Pietro says that he took a look at the changes w.r.t. his branch and that he will resolve these other items before a PR to the master branch. I took the important items and put them into a checklist here: https://github.com/SUEPPhysics/SUEPCoffea_dask/issues/278

(we can figure out what to do with the notebook at that time too)

Any other thoughts, or should we just go merge it?