bingmann / digup

digup - A Digest Updating Tool
http://panthema.net/2009/digup/
GNU General Public License v3.0
4 stars 3 forks source link

cannot open files containing unicode characters #3

Open rizalp opened 3 weeks ago

rizalp commented 3 weeks ago

Hi, I'd like to say thanks for this utility. It helps me do checksums of files stored on external disks, and be confident when the files are copied correctly between the internal disk mounted on host os (ubuntu / windows dual boot) and external drive. But I found problems.

On Windows, digup seems unable to open files that contains unicode characters. This problem doesn't exist on Linux. I tried using another tools called dirhash, which able to open that files without a problem.

Sample, using default Windows Terminal and PowerShell version 5.1 on Windows 11:

PS D:\Downloads\test> echo "123" > "sample1:"filename?|🔥".txt"
PS D:\Downloads\test> dir

    Directory: D:\Downloads\test

Mode                 LastWriteTime         Length Name
----                 -------------         ------ ----
-a----         11/2/2024   2:12 PM             12 sample1:"filename?|🔥".txt

PS D:\Downloads\test> digup
C:\Users\rizalp\bin\digup.exe: no digest file found. Creating "sha1sum.txt" from full scan.
C:\Users\rizalp\bin\digup.exe: could not stat file "./sample1:"filename?|??".txt": No such file or directory
C:\Users\rizalp\bin\digup.exe: no deleted files detected during scan.
Scan finished. File scan summary:
      Total: 0
Command (see help)? save
C:\Users\rizalp\bin\digup.exe: wrote 0 digests to sha1sum.txt
PS D:\Downloads\test> dirhash . -sum

DirHash 1.26.1 by Mounir IDRASSI (mounir@idrix.fr) Copyright 2010-2024

Recursively compute hash of a given directory content in lexicographical order.
It can also compute the hash of a single file.

Supported Algorithms :
 MD5 SHA1 SHA256 SHA384 SHA512 Streebog Blake2s Blake2b Blake3

Using Blake3 to compute checksum of "." ...
6D2DBC0B0257EB467D0E33417752E6A2E2DDE12717B9ED65AF0C057FDF49CE66  .\sample1:"filename?|🔥".txt
221048A51F561B830698F43930AD18119BB5F4D73A9A386DB87395258708460C  .\sha1sum.txt

This became a problem in case users are creating files in Linux which contains unicode, creating sumfile on linux, but then when tried to verify it on Windows they cannot verify it.

rizalp commented 3 weeks ago

quick fix: after enabling Beta: Use Unicode UTF-8 for worldwide language support , windows seems able to see the file:

PS D:\Downloads\test> digup
C:\Users\rizalp\bin\digup.exe: no digest file found. Creating "sha1sum.txt" from full scan.
sample1:"filename?|🔥".txt . new.
C:\Users\rizalp\bin\digup.exe: no deleted files detected during scan.
Scan finished. File scan summary:
        New: 1
      Total: 1
Command (see help)? save
C:\Users\rizalp\bin\digup.exe: wrote 1 digests to sha1sum.txt
PS D:\Downloads\test> cat .\sha1sum.txt
# C:\Users\rizalp\bin\digup.exe last update: 2024-11-02 17:24:22 SE Asia Standard Time
#: mtime 1730531538 size 12
ae7163a3143b14a8c717d3d24183ddff26afe60e  sample1:"filename?|🔥".txt
#: crc 0x3cd124c6 eof

Also, windows still hasn't fixed the longpath due to compatibility reasons