epic-open-source / seismometer

AI model evaluation with a focus on healthcare
https://epic-open-source.github.io/seismometer/
BSD 3-Clause "New" or "Revised" License
166 stars 20 forks source link

Meta: Create pre-commit hook to clean example notebooks #44

Open diehlbw opened 1 month ago

diehlbw commented 1 month ago

Is your feature request related to a problem? Please describe

Tracking of ipynb's are inherently error prone as diffs of output cells can be very tough.
On the other hand, it's rarely necessary/useful to track the outputs themselves.

Describe the solution you'd like

Create a pre-commit hook that prunes output cells on the example-notebooks *.ipynb files so that they are not committed.

diehlbw commented 1 month ago

Some initial thoughts.

pre-commit is being used for linting and such but there needs to be a substantial level of trust to add plugins since repo+rev is potentially overridable by the owners.
Without that high trust, it is likely better to:

The last bullet is comparatively effective as the more integrated approach, though needs more custom workflow code. The script itself is pretty straightforward - it's just running nbconvert inplace and then using git diff to exit if the file(s) changed (<10 lines for our basic wants)

gbowlin commented 3 weeks ago

In #55, I am adding a step to make docs that will clear out all notebooks contents. Its not a pre-commit hook, but it will run in the CI/CD pipeline.