Closed slimm609 closed 3 years ago
Hi @slimm609 thanks for the submission. This seems like a reasonable approach to parallelise and speed up processing of many secret files, but I need to do some research to work out if backgrounding too many function calls could cause a different problem.
Can you share a little about your situation that triggers the bad performance? How many transcrypt
-encrypted files do you have, and how long does a commit take before your changes?
I'm also wondering if we should have a way of easily disabling the pre-commit check?
As you point out, it's there as a safety-check only. But if it's causing poor performance then turning it off might be a better option than working around the poor performance. It's really only necessary if a user happens to use a Git client that doesn't apply the usual text conversion features of Git, so it really is guarding against a relatively unusual situation.
A better workaround might be for transcrypt
to recognise a manually-applied Git config option like transcrypt.skip-pre-commit-checks
I have been using it this way for about 7 months with no issues. Had about 2500 files that it is encrypting. I could pretty easily add forking based on the number of cores reported, (e.g. fork to a max of 8 children for 8 cores) which would still increase the speed. At the slow point, it would take upwards of 15 to 20 seconds just for the crypt. My local copy is tweaked a bit to use gcm instead of CBC and using iter 10000 for each secret so it does take longer per run.
Hi @slimm609 I think that for your use-case, improving the speed of the pre-commit check might be solving the wrong problem. If you don't need the safety-check, running it 2,500 times is wasteful even if you can make it acceptably fast.
What do you think about adding an optional transcrypt config setting to bypass the pre-commit check? If you know you are using a Git client that does the right thing, there isn't much value in running the safety checks at all.
Or you could delete the .git/hooks/pre-commit file altogether.
The repo is shared between lots of people and I can’t control what client each person uses, so having the check is a required and has stopped at least 1 commit from a user that I know of.
Ah, I see. I'm glad to hear the safety check is doing its job!
Can you please add the limit so only cpu-count functions are run at once?
Do many encrypted files change each commit in your usage? We might be able to run the safety check only on files that have actually changed.
I added forking to match the number of procs so it won't hit limits. Before bash 4.4, it was messy to fork and wait so if its an older bash version, it will just run the regular way.
Hi @slimm609 thanks very much for this, that's some advance bash work! I have merged your change to the master
branch.
I merged via a separate repo-local PR #101 where I applied a couple of tweaks to make the pre-commit script pass lint checks, and to print the bash version during test runs. From printing the bash versions during tests I learned:
Speed up git-commit-crypt hook. If there are lots of files, it can take a long time to run the hook. Since this is a fail-safe, the majority of the time there are no failures, so this saves a lot of time.
This splits it into 2 parts 1) Fork each file into its own verification to do lots of them in parallel, ignoring filename, etc 2) If any of the forked runs fail, then run is slower mode to display what failed.