Closed ljm42 closed 2 years ago
Hi @ljm42 can you please test the change in PR #117 to see if it fixes the problem with small files?
This is great! I really appreciate you looking into this, and for your work on #114 as well!
My dataset includes txz files, which are essentially zips that have been renamed. And apparently they contain null bytes in the first 8 chars, because they cause this line:
firstbytes=$(head -c8 "$tempfile")
to throw this error:
transcrypt: line 134: warning: command substitution: ignored null byte in input
(ignore the line number, I added some debug code to figure out which files were causing the error)
I solved it by converting null bytes to spaces:
firstbytes=$(head -c8 "$tempfile" | tr '\0' ' ')
and now git_clean()
works on both small files and zip files.
It looks like all of the encryption unit tests are based on the string My secret content
. Would it make sense to include basic tests for other variations such such as a string that is less than 8 characters, and a super small zip file? Just to ensure that these types of files will work in future versions?
Hi @ljm42 I have updated the PR #117 with a couple of extra tests for small files and files with a null byte near the start.
Can you check that the fix-clean-filter-for-small-files
branch works for your problem files? If so, I should be able to merge what is a fairly minor change.
Sorry, #117 fix-clean-filter-for-small-files
branch still throws warnings with txz files (although I think it does actually encrypt them). Here is a test with a small txz file (rather than the string from the new test sh\0x
):
# echo "sample text" > secret.txt
# tar cfJ archive.txz secret.txt
# rm secret.txt
# echo "archive.txz filter=crypt diff=crypt merge=crypt" >> .gitattributes
# git add .gitattributes archive.txz
.git/crypt/transcrypt: line 129: warning: command substitution: ignored null byte in input
Running line 129 directly results in the same:
# firstbytes=$(head -c8 archive.txz)
-bash: warning: command substitution: ignored null byte in input
With this modification there are no more warnings:
# firstbytes=$(head -c8 archive.txz | tr '\0' ' ')
I am not sure why the string sh\0x
does not throw the same warning
Thanks for the continued test feedback. This one is tricky.
I haven't been able to reproduce the warning message on my Mac, so I Googled and it looks like that warning is new in Bash version 4.4: https://askubuntu.com/questions/926626/how-do-i-fix-warning-command-substitution-ignored-null-byte-in-input
Unfortunately, while adding the simple tr
command might work for Bash 4.4 and later it doesn't work for the version on my up-to-date Mac 3.2.57(1):
$ firstbytes=$(head -c8 archive.txz | tr '\0' ' ')
tr: Illegal byte sequence
So it looks like new bash complains about null read by head
but not tr
, while older bash is fine reading other (non-null) bytes in head
but can fail if you pass them to tr
depending on your shell's character encoding settings. Specifically for this test case, the byte with octal character code 375 breaks tr
for my older bash in my shell with LANG=en_GB.UTF-8
The only workaround I have found so far is this hack with an explicit LC_ALL=C
to override any default character encoding settings. It's ugly, but might be the best we can do:
firstbytes=$(head -c8 archive.txz | LC_ALL=C tr -d '\0')
Notice also that tr
deletes instead of substituting nulls.
I have applied the LC_ALL=C tr -d '\0'
fix on the fix-clean-filter-for-small-files
branch, can you please try it again @ljm42?
So sorry for the delayed response! This fix works great for me. I'm impressed with how many variations of bash/etc that you are able to handle!
There are two other instances of "head -c8" in transcrypt, do they need need this as well?
I'm very late getting to this, sorry.
It's possible the other uses head -c8
are problematic, but since you haven't seen any definite problems for the unusual files in this ticket I'm reluctant to change them just in case.
For now I'll leave the minimal change as it is, and finally get it merged to main
via #117
Awesome, thank you very much!
If you try to encrypt a small file (less than 8 chars) it fails:
The problem is with the
read
command in git_clean(): https://github.com/elasticdog/transcrypt/blob/dce1ad0dc41d3fb5eb754e5e135761910f275968/transcrypt#L129One solution is to add an
or
statement like this: