NSLS-II-SST / rsoxs_scans

stand alone package for loading and validating RSoXS scans for NSLS-II SST-1 beamline
BSD 3-Clause "New" or "Revised" License
0 stars 0 forks source link

Trim Jupyter Notebook data #11

Closed BijalBPatel closed 1 year ago

BijalBPatel commented 1 year ago

Challenge: Uploading Jupyter notebooks to git can give huge file sizes because the output of cells is also synced (not just the code).

For unrelated projects. I'd addressed this by using pre-commit (https://pre-commit.com) following the protocol here:

Todo: consider whether this needs follow-up at all and elaborate on process.

BijalBPatel commented 1 year ago

Trialing this way instead - it seems easier and doesn't require extra python packages.

https://gist.github.com/33eyes/431e3d432f73371509d176d0dfb95b6e

BijalBPatel commented 1 year ago

All users would need to do is download the repository and run:

git config filter.strip-notebook-output.clean 'jupyter nbconvert --ClearOutputPreprocessor.enabled=True --to=notebook --stdin --stdout --log-level=ERROR'

In terminal within the repository folder

EliotGann commented 1 year ago

before committing, we can just remove all outputs in the jupyter menu

BijalBPatel commented 1 year ago

I think people will forget to do it. But also this is of minor concern since we only have 1 notebook in the repository. This is more an issue for data analysis heavy repositories.

Now that there is a good solution documented, we can always just revisit if it becomes a problem.