LDMX-Software / ldmx-sw

The Light Dark Matter eXperiment simulation and reconstruction framework.
https://ldmx-software.github.io
GNU General Public License v3.0
21 stars 19 forks source link

Streamlined validation code #1114

Open bryngemark opened 1 year ago

bryngemark commented 1 year ago

Is your feature request related to a problem? Please describe. It would be great to have a streamlined plotting of the DQM plots produced in the GitHub validation action, as well as other standard validation plots each subsystem needs.

Describe the solution you'd like @tomeichlersmith has started making some of this for Ecal and I'd like to import that to ldmx-sw. His repo is already set up to run comparisons across versions, and is equipped to use with notebooks. The idea would be to have the plotting tool be general, with the possibility to run system specific simulations (e.g. what type of beam you'd like and starting from where), and list the histograms you want to plot.

Describe alternatives you've considered That we start something from scratch.

Additional context This validation plot suite could also be included as the end step of our automatized GitHub validation action triggered by PRs. This way we'd make it easier to actually inspect the gold/new comparison plots.

tvami commented 3 weeks ago

Does this issue just mean that we should use https://github.com/LDMX-Software/ldmx-sw/tree/trunk/Validation in the CI?

tomeichlersmith commented 3 weeks ago

I'm not sure if that was the original intent, but I do think it would be a good idea to have the plotting code used in the CI be something that humans can use directly as well.

We would need to use NumPy's KS test rather than ROOT's which isn't a large problem. The PyROOT script I wrote for the CI can be used as a potential reference although we could probably be a bit more careful and make sure the plots are prettier and maybe combine them into a slide-deck rather than a directory for easier browsing by reviewers.

bryngemark commented 3 weeks ago

@tvami yes this is what i had in mind, originally, and i still think it's a good idea. as @tomeichlersmith implies, making human interactions with the CI output easier would be a good step forward.

it would be particularly amazing if failed test plots could be attached to the PR by the action, in some non-obscure location (avoiding the whole artifact downloading, untarring, etc), perhaps even to the thread itself? we're not penalized for using large amounts of data for our discussions, right?

tomeichlersmith commented 3 weeks ago

We are not penalized directly for using large amounts of data within discussions, but we are prevented from doing so indirectly.

As far as I can tell, GitHub does not allow programmatic uploading of files to PRs/Issues/Discussions. The easiest evidence for this is probably the fact that GitHub's own CLI program is able to make a comment on an issue but is not able to upload any files while making the comment.[^1] This isn't a technical issue because we can upload files with certain extensions to GitHub PR/Issue comments through the browser and you can even upload files to attach to releases with gh release upload.

So how could we get around this? Well, you may notice that when you upload an image to GitHub, it stores it in under some unique ID within github.com/user-attachments/{files,assets} (assets for images and files for everything else) and then just inserts a link to that location for viewing the image (or downloading if the file is not an image). For example,

![DM Plushie](https://www.particlezoo.net/cdn/shop/products/dark_matter_c0dfc7d0-e4c4-4063-8bc3-f8563646dc69_1024x1024.jpg?v=1491285981)

looks like

DM Plushie

So, we need a host that allows us to programmatically upload images while also making those images publicly viewable so we can then link to them on GitHub. This is how other organizations enable attachment of validation plots/files onto GitHub. They aren't attaching plots/files to GitHub, they are just linking to a known location that they have access to.

Options for Us

Summary

We are reaching the end of what a free-tier, no self-host option gets us. If we want to make the viewing of plots more ergonomic, we almost certainly need to use self-hosting in some way.

[^1]: The manual for gh issue comment implies this. There is no "upload this file" option. gh pr comment is similar.