Add pre-commit config - Githubissues

adrn commented 2 years ago

This adds a config file for pre-commit, which includes a run of nbstripout to ensure that cell output has been removed along with python version, TOC metadata, and kernelspec info. I think this solves astropy/nbcollection#16!

I also configured pre-commit.ci to run on PRs in this repo. We need to add some info about the expected contributor workflow. Since pre-commit.ci will push changes to contributor branches, this means contributors need to make sure to pull before they modify notebooks contributed in PRs. It also means that we should squash-and-merge by default to make sure cell outputs don't end up in the commit history. I think this is still simpler than asking contributors to rebase if they accidentally commit cell output. I added TODO items here to remind me to document this.

One question: should we disable merge commit merging, and only allow squash or rebase merges?

Fixes #518

review-notebook-app[bot] commented 2 years ago

Check out this pull request on

See visual diffs & provide feedback on Jupyter Notebooks.

Powered by ReviewNB

adrn commented 2 years ago

Whoops - the failing "Test utility scripts" is my fault. I set up that workflow to run when the "Infrastructure" tag is present. But it works by diffing against main, and in this case I modified a bunch of other notebooks. I have to rethink when to run those tests...

jonathansick commented 2 years ago

One question: should we disable merge commit merging, and only allow squash or rebase merges?

With the goal of making sure we don't increase the repo size by merging commits with outputs to main, I think the only workable strategy is squash. That's my reading of https://docs.github.com/en/repositories/configuring-branches-and-merges-in-your-repository/configuring-pull-request-merges/about-merge-methods-on-github#squashing-your-merge-commits Do you agree? I'm definitely in favour of changing the settings to enforce this.

I do wonder how GitHub records authorship for these squashes, since both the original author and pre-commit.ci's bot are "authors", and of course the person who triggers the merge is a committer. I think it'll work out, but it's something to keep an eye on.

mwcraig commented 2 years ago

I haven't reviewed in detail but I like the approach.

mwcraig commented 2 years ago

With the goal of making sure we don't increase the repo size by merging commits with outputs to main, I think the only workable strategy is squash.

I don't know if even squashing will remove big commits entirely....it might, but haven't experimented. I've used this for cleaning out big files/commits in the past: https://rtyley.github.io/bfg-repo-cleaner/

adrn commented 2 years ago

@jonathansick

In the make init you can also install dependencies, etc. I just find it easier to document running a make command rather than explaining pre-commit sometimes.

Aha, I like it! I guess we could also add a pip install -r requirements-dev.txt to install Jupyter, nbcollection, and all notebook dependencies as well?

@mwcraig

I don't know if even squashing will remove big commits entirely

I think if the commit that added the big-ness was in the same set of commits as the squash, it will remove the big-ness? But I could be wrong!

jonathansick commented 2 years ago

I don't know if even squashing will remove big commits entirely

I think if the commit that added the big-ness was in the same set of commits as the squash, it will remove the big-ness? But I could be wrong!

Yeah, if the squashed commit doesn't contain that big content, that big content will never make it to the main branch on origin here. What might happen though is in a contributor's cloned repo the big commits might stay around (something to do with the reflog???). But I'm pretty sure this won't have an impact on our ultimate goal of keeping the size of the origin repo in check for clones.

astropy / astropy-tutorials

Add pre-commit config #531