elasticdog / transcrypt

transparently encrypt files within a git repository
MIT License
1.46k stars 102 forks source link

Consider improving transcrypt's handling of large files #85

Open jmurty opened 4 years ago

jmurty commented 4 years ago

As @perost-l14 mentioned in these comments on #78 transcrypt currently does some things that hinder its use for encrypting many and/or large files.

This ticket is to draw out suggested improvements so they don't get lost in the broader discussion in #78.

In particular, as paraphrased by me (@jmurty):

Would it make sense to update transcrypt to use binary data instead of base64, and set or recommend -delta in .gitattributes by default?

What would be the implications of doing these things, for both new transcrypt'ed repos and existing ones?

ZhymabekRoman commented 1 year ago

What is the status of this improvement? I think it's a necessary feature even for small files. Using base64 is not necessarily or required for like git repos.

ZhymabekRoman commented 1 year ago

I'll try to improve that. And I'll also try to optimise transcrypt. Because I have a large git repository over 500 mb and decryption is so slow.

jmurty commented 1 year ago

Work on improving the efficiency of transcrypt would be welcome, though be warned that using it to encrypt large amounts of data or files isn't really the expected use-case – it's intended for a few small secret files that are part of a larger repo.

That said, there might be some easy wins that would improve things without requiring a major rewrite or breaking changes.

I'd encourage you to start by looking at the building block git_clean (encrypt) and git_smudge (decrypt) functions in the script. You can run these separately to simulate the steps taken behind the scenes by Git, and testing the performance and correctness of these atomic pieces is likely to be much easier than working with a real repository.

Examples of this based on the current main branch, run within this project's repository:

# Manual and minimal transcrypt config in repository
git config --local transcrypt.cipher aes-256-cbc
git config --local transcrypt.password 'correct horse battery staple'
git config --local transcrypt.openssl-path openssl

# Decrypt the encrypted sensitive_file
cat sensitive_file | ./transcrypt smudge context=default sensitive_file

# Encrypt the decrypted sensitive_file
cat sensitive_file | ./transcrypt clean context=default sensitive_file 2>/dev/null
natew commented 3 months ago

Was just wondering, is transcrypt meant to be pretty slow? I'm noticing really slow operations slowness even on smaller files, its especially painful if you move a lot around. We have only ~300 encrypted files but it'll take my M3 pro like 5-10 minutes for some operations when moving all of them around.

Could help sponsor speeding this up if there's some interest there.

jmurty commented 2 months ago

Hi @natew as mentioned in prior comments, transcrypt as currently implemented isn’t intended – and isn’t good at – handling large numbers of files, or files of large size.

I don’t have time to work on this even if sponsored, and to be honest I’m not sure how much faster it could be given transcrypt is at bottom a series of bash scripts invoked by Git that in turn call other shell commands to do the work.

Perhaps someone else would be able to do some investigation? The first place I would suggest researching is whether Git’s smudge and clean commands run in series or parallel, and if not parallel already can they be made so?