elasticdog / transcrypt

transparently encrypt files within a git repository
MIT License
1.48k stars 104 forks source link

'clean' filter fails on small files #116

Closed ljm42 closed 2 years ago

ljm42 commented 3 years ago

If you try to encrypt a small file (less than 8 chars) it fails:

# git add <filename>
fatal: <filename>: clean filter 'crypt' failed

The problem is with the read command in git_clean(): https://github.com/elasticdog/transcrypt/blob/dce1ad0dc41d3fb5eb754e5e135761910f275968/transcrypt#L129

One solution is to add an or statement like this:

  read -rn 8 firstbytes <"$tempfile" || firstbytes=X
jmurty commented 3 years ago

Hi @ljm42 can you please test the change in PR #117 to see if it fixes the problem with small files?

ljm42 commented 3 years ago

This is great! I really appreciate you looking into this, and for your work on #114 as well!

My dataset includes txz files, which are essentially zips that have been renamed. And apparently they contain null bytes in the first 8 chars, because they cause this line: firstbytes=$(head -c8 "$tempfile") to throw this error: transcrypt: line 134: warning: command substitution: ignored null byte in input (ignore the line number, I added some debug code to figure out which files were causing the error)

I solved it by converting null bytes to spaces: firstbytes=$(head -c8 "$tempfile" | tr '\0' ' ') and now git_clean() works on both small files and zip files.

It looks like all of the encryption unit tests are based on the string My secret content. Would it make sense to include basic tests for other variations such such as a string that is less than 8 characters, and a super small zip file? Just to ensure that these types of files will work in future versions?

jmurty commented 3 years ago

Hi @ljm42 I have updated the PR #117 with a couple of extra tests for small files and files with a null byte near the start.

Can you check that the fix-clean-filter-for-small-files branch works for your problem files? If so, I should be able to merge what is a fairly minor change.

ljm42 commented 3 years ago

Sorry, #117 fix-clean-filter-for-small-files branch still throws warnings with txz files (although I think it does actually encrypt them). Here is a test with a small txz file (rather than the string from the new test sh\0x):

# echo "sample text" > secret.txt
# tar cfJ archive.txz secret.txt
# rm secret.txt
# echo "archive.txz filter=crypt diff=crypt merge=crypt" >> .gitattributes
# git add .gitattributes archive.txz
.git/crypt/transcrypt: line 129: warning: command substitution: ignored null byte in input

Running line 129 directly results in the same:

# firstbytes=$(head -c8 archive.txz)
-bash: warning: command substitution: ignored null byte in input

With this modification there are no more warnings: # firstbytes=$(head -c8 archive.txz | tr '\0' ' ')

I am not sure why the string sh\0x does not throw the same warning

jmurty commented 3 years ago

Thanks for the continued test feedback. This one is tricky.

I haven't been able to reproduce the warning message on my Mac, so I Googled and it looks like that warning is new in Bash version 4.4: https://askubuntu.com/questions/926626/how-do-i-fix-warning-command-substitution-ignored-null-byte-in-input

Unfortunately, while adding the simple tr command might work for Bash 4.4 and later it doesn't work for the version on my up-to-date Mac 3.2.57(1):

$ firstbytes=$(head -c8 archive.txz | tr '\0' ' ')
tr: Illegal byte sequence

So it looks like new bash complains about null read by head but not tr, while older bash is fine reading other (non-null) bytes in head but can fail if you pass them to tr depending on your shell's character encoding settings. Specifically for this test case, the byte with octal character code 375 breaks tr for my older bash in my shell with LANG=en_GB.UTF-8

The only workaround I have found so far is this hack with an explicit LC_ALL=C to override any default character encoding settings. It's ugly, but might be the best we can do:

firstbytes=$(head -c8 archive.txz | LC_ALL=C tr -d '\0')

Notice also that tr deletes instead of substituting nulls.

jmurty commented 3 years ago

I have applied the LC_ALL=C tr -d '\0' fix on the fix-clean-filter-for-small-files branch, can you please try it again @ljm42?

ljm42 commented 3 years ago

So sorry for the delayed response! This fix works great for me. I'm impressed with how many variations of bash/etc that you are able to handle!

There are two other instances of "head -c8" in transcrypt, do they need need this as well?

jmurty commented 2 years ago

I'm very late getting to this, sorry.

It's possible the other uses head -c8 are problematic, but since you haven't seen any definite problems for the unusual files in this ticket I'm reluctant to change them just in case.

For now I'll leave the minimal change as it is, and finally get it merged to main via #117

ljm42 commented 2 years ago

Awesome, thank you very much!