Closed cocox closed 1 year ago
The failing files are: "lp/2017/lp-marca/latam/brand/img/Sin-t\303\255tulo-2.jpg" "lp/2017/lp-marca/latam/brand/img/Sin-t\355tulo-2.jpg"
Just so I can reproduce your environment, please tell me:
git restore-mtime --version
available?)Just created a brand new repo with just these 2 files:
"lp/2017/lp-marca/latam/brand/img/Sin-t\303\255tulo-2.jpg"
This one seems to work just fine
"lp/2017/lp-marca/latam/brand/img/Sin-t\355tulo-2.jpg"
This does not look like a valid a valid UTF-8 filename... and it triggers the exact error you posted.
Not sure how I could handle such "invalid" filenames, or even if I should handle them...
rodrigo@desktop ~/teste $ git init
Reinitialized existing Git repository in /home/rodrigo/teste/.git/
rodrigo@desktop ~/teste $ git config hooks.allownonascii true
rodrigo@desktop ~/teste $ touch "$(printf "Sin-t\303\255tulo-2.jpg")"
rodrigo@desktop ~/teste $ ls -l
total 0
-rw-rw-r-- 1 rodrigo rodrigo 0 Jul 13 04:55 Sin-título-2.jpg
rodrigo@desktop ~/teste $ git add .
rodrigo@desktop ~/teste $ git commit -m 'initial'
[main (root-commit) 6f203a3] initial
1 file changed, 0 insertions(+), 0 deletions(-)
create mode 100644 "Sin-t\303\255tulo-2.jpg"
rodrigo@desktop ~/teste $ git-restore-mtime --verbose
1 files to be processed in work dir
Line # Log # F.Left Modification Time File Name
3 1 0 2023-07-13 04:56:14 Sin-título-2.jpg
3 1 - 2023-07-13 04:56:14 ./
Statistics:
0.01 seconds
3 log lines processed
1 commits evaluated
1 directories updated
1 files updated
rodrigo@desktop ~/teste $ ls -l
total 0
-rw-rw-r-- 1 rodrigo rodrigo 0 Jul 13 04:56 Sin-título-2.jpg
rodrigo@desktop ~/teste $ touch "$(printf "Sin-t\355tulo-2.jpg")"
rodrigo@desktop ~/teste $ ls -l
total 0
-rw-rw-r-- 1 rodrigo rodrigo 0 Jul 13 04:56 Sin-título-2.jpg
-rw-rw-r-- 1 rodrigo rodrigo 0 Jul 13 04:57 'Sin-t'$'\355''tulo-2.jpg'
rodrigo@desktop ~/teste $ git add .
rodrigo@desktop ~/teste $ git commit -m 'bad filename'
[main 8d716db] bad filename
1 file changed, 0 insertions(+), 0 deletions(-)
create mode 100644 "Sin-t\355tulo-2.jpg"
rodrigo@desktop ~/teste $ git-restore-mtime --verbose
Traceback (most recent call last):
File "/home/rodrigo/.local/bin/git-restore-mtime", line 594, in <module>
sys.exit(main())
File "/home/rodrigo/.local/bin/git-restore-mtime", line 486, in main
filelist = set(git.ls_files(args.pathspec))
File "/home/rodrigo/.local/bin/git-restore-mtime", line 311, in <genexpr>
return (normalize(_) for _ in self._run('ls-files --full-name', paths))
File "/home/rodrigo/.local/bin/git-restore-mtime", line 254, in normalize
.decode('utf8')) # Decode from UTF-8
UnicodeDecodeError: 'utf-8' codec can't decode byte 0xed in position 5: invalid continuation byte
Just so I can reproduce your environment, please tell me:
- What version are you using, and where did you get it from? (is
git restore-mtime --version
available?) git-restore-mtime version 2022.12 throught APT- What is the underlying filesystem? NTFS, EXT4, something else? ext4
- What platform/OS and version? Debian 12
- Can you paste the non-escaped, UTF-8 filenames here? I'm having some trouble re-creating them I thnk you could reproduce, correct?
- Can you paste the non-escaped, UTF-8 filenames here? I'm having some trouble re-creating them I thnk you could reproduce, correct?
I guess I did, just not sure if the filenames I created are exactly the same as yours.
The first one, with proper UTF-8, git restore-mtime
seems to handle just fine. Can you confirm that by creating a brand new repository with just that file?
The second one, Sin-t\355tulo-2.jpg
, is the problematic one. But I'm not sure if I'm re-creating it accurately. Is it really a filename with invalid UTF-8 encoding? This looks like the old Windows-1252 encoding (\355
is 0xED
, which is í
in that encoding, the same as \303\255
in UTF8).
Mixing different encodings in the same filesystem is problematic enough, let alone committing such files to a git repository. I might be able to handle such cases, just not sure if git restore-mtime
should deal with invalid (or mixed) encodings
@cocox : another test, please post the result of: python3 -c 'import os; d = os.listdir(); print(d); [print(_) for _ in d]'
in a directory containing just those 2 files?
If i try to execute the command 'git restore-mtime --test' i get this error:
Traceback (most recent call last): File "/usr/lib/git-core/git-restore-mtime", line 594, in
sys.exit(main())
^^^^^^
File "/usr/lib/git-core/git-restore-mtime", line 530, in main
parse_log(filelist, dirlist, stats, git, args.merge, args.pathspec)
File "/usr/lib/git-core/git-restore-mtime", line 410, in parse_log
file = normalize(file)
^^^^^^^^^^^^^^^
File "/usr/lib/git-core/git-restore-mtime", line 254, in normalize
.decode('utf8')) # Decode from UTF-8
^^^^^^^^^^^^^^
UnicodeDecodeError: 'utf-8' codec can't decode byte 0xed in position 38: invalid continuation byte