elasticdog / transcrypt

transparently encrypt files within a git repository
MIT License
1.43k stars 102 forks source link

Speed up git-commit-crypt hook #93

Closed slimm609 closed 3 years ago

slimm609 commented 3 years ago

Speed up git-commit-crypt hook. If there are lots of files, it can take a long time to run the hook. Since this is a fail-safe, the majority of the time there are no failures, so this saves a lot of time.

This splits it into 2 parts 1) Fork each file into its own verification to do lots of them in parallel, ignoring filename, etc 2) If any of the forked runs fail, then run is slower mode to display what failed.

jmurty commented 3 years ago

Hi @slimm609 thanks for the submission. This seems like a reasonable approach to parallelise and speed up processing of many secret files, but I need to do some research to work out if backgrounding too many function calls could cause a different problem.

Can you share a little about your situation that triggers the bad performance? How many transcrypt-encrypted files do you have, and how long does a commit take before your changes?

I'm also wondering if we should have a way of easily disabling the pre-commit check?

As you point out, it's there as a safety-check only. But if it's causing poor performance then turning it off might be a better option than working around the poor performance. It's really only necessary if a user happens to use a Git client that doesn't apply the usual text conversion features of Git, so it really is guarding against a relatively unusual situation.

A better workaround might be for transcrypt to recognise a manually-applied Git config option like transcrypt.skip-pre-commit-checks

slimm609 commented 3 years ago

I have been using it this way for about 7 months with no issues. Had about 2500 files that it is encrypting. I could pretty easily add forking based on the number of cores reported, (e.g. fork to a max of 8 children for 8 cores) which would still increase the speed. At the slow point, it would take upwards of 15 to 20 seconds just for the crypt. My local copy is tweaked a bit to use gcm instead of CBC and using iter 10000 for each secret so it does take longer per run.

jmurty commented 3 years ago

Hi @slimm609 I think that for your use-case, improving the speed of the pre-commit check might be solving the wrong problem. If you don't need the safety-check, running it 2,500 times is wasteful even if you can make it acceptably fast.

What do you think about adding an optional transcrypt config setting to bypass the pre-commit check? If you know you are using a Git client that does the right thing, there isn't much value in running the safety checks at all.

Or you could delete the .git/hooks/pre-commit file altogether.

slimm609 commented 3 years ago

The repo is shared between lots of people and I can’t control what client each person uses, so having the check is a required and has stopped at least 1 commit from a user that I know of.

jmurty commented 3 years ago

Ah, I see. I'm glad to hear the safety check is doing its job!

Can you please add the limit so only cpu-count functions are run at once?

Do many encrypted files change each commit in your usage? We might be able to run the safety check only on files that have actually changed.

slimm609 commented 3 years ago

I added forking to match the number of procs so it won't hit limits. Before bash 4.4, it was messy to fork and wait so if its an older bash version, it will just run the regular way.

jmurty commented 3 years ago

Hi @slimm609 thanks very much for this, that's some advance bash work! I have merged your change to the master branch.

I merged via a separate repo-local PR #101 where I applied a couple of tweaks to make the pre-commit script pass lint checks, and to print the bash version during test runs. From printing the bash versions during tests I learned: