Closed plattrap closed 4 years ago
Thanks for the report. Seems like bug while traversing directory content with uft8 file-names. Let me check.
Are you using the latest docker image ? Try (docker pull laks/dduper). For me that file name seem to work.
Skipped /mnt/Finland.J$344rvenp$344344-Elisa.xml not unique regular files or file size < 4kb
Perfect match : /mnt/a1 /mnt/Показатели
Perfect match : /mnt/a1 /mnt/Показат
Perfect match : /mnt/a1 /mnt/Finland.J'$'\344''rvenp'$'\344\344''-Elisa.xml
Skipped /mnt/Finland.J$344rvenp$344344-Elisa.xml not unique regular files or file size < 4kb
Perfect match : /mnt/a1 /mnt/Показатели
Perfect match : /mnt/a1 /mnt/Показат
Perfect match : /mnt/a1 /mnt/Finland.J'$'\344''rvenp'$'\344\344''-Elisa.xml
Skipped /mnt/Finland.J$344rvenp$344344-Elisa.xml not unique regular files or file size < 4kb
Perfect match : /mnt/a1 /mnt/Показатели
Perfect match : /mnt/a1 /mnt/Показат
Perfect match : /mnt/a1 /mnt/Finland.J'$'\344''rvenp'$'\344\344''-Elisa.xml
Skipped /mnt/Finland.J$344rvenp$344344-Elisa.xml not unique regular files or file size < 4kb
Perfect match : /mnt/a1 /mnt/Показатели
Perfect match : /mnt/a1 /mnt/Показат
Perfect match : /mnt/a1 /mnt/Finland.J'$'\344''rvenp'$'\344\344''-Elisa.xml
Skipped /mnt/Finland.J$344rvenp$344344-Elisa.xml not unique regular files or file size < 4kb
Perfect match : /mnt/a1 /mnt/Показатели
Perfect match : /mnt/a1 /mnt/Показат
Perfect match : /mnt/a1 /mnt/Finland.J'$'\344''rvenp'$'\344\344''-Elisa.xml
Skipped /mnt/Finland.J$344rvenp$344344-Elisa.xml not unique regular files or file size < 4kb
Perfect match : /mnt/a1 /mnt/Показатели
Perfect match : /mnt/a1 /mnt/Показат
Perfect match : /mnt/a1 /mnt/Finland.J'$'\344''rvenp'$'\344\344''-Elisa.xml
+----------------+-------------------------------------------------------------+---------------+
| Chunk Size(KB) | Files | Duplicate(KB) |
+----------------+-------------------------------------------------------------+---------------+
| 256 | /mnt/a1:/mnt/a2 | 51200 |
| 256 | /mnt/a1:/mnt/Показатели | 51200 |
| 256 | /mnt/a1:/mnt/Показат | 51200 |
| 256 | /mnt/a1:/mnt/Finland.J'$'\344''rvenp'$'\344\344''-Elisa.xml | 51200 |
+----------------+-------------------------------------------------------------+---------------+
dduper:204800KB of duplicate data found with chunk size:256KB
+----------------+-------------------------------------------------------------+---------------+
| Chunk Size(KB) | Files | Duplicate(KB) |
+----------------+-------------------------------------------------------------+---------------+
| 512 | /mnt/a1:/mnt/a2 | 51200 |
| 512 | /mnt/a1:/mnt/Показатели | 51200 |
| 512 | /mnt/a1:/mnt/Показат | 51200 |
| 512 | /mnt/a1:/mnt/Finland.J'$'\344''rvenp'$'\344\344''-Elisa.xml | 51200 |
+----------------+-------------------------------------------------------------+---------------+
dduper:204800KB of duplicate data found with chunk size:512KB
Some more detail on the file name, I dumped the file system representation of it and another copy with a better encoding.
Problem seems to be on the bad file name, the ä
is encoded as a e4
byte, and on the good one as c3 a4
. So Python3 is trying to decode the sequence e4 72
as "utf-8" and not as two characters in "iso-8859" är
. Wikipedia
The solution probably is to treat file and directory names as byte strings, and do some extra checks before displaying them?
Finland.Järvenpää-Elisa.xml
46 69 6e 6c 61 6e 64 2e 4a c3 a4 72 76 65 6e 70 c3 a4 c3 a4 2d 45 6c 69 73 61 2e 78 6d 6c
Finland.J�rvenp��-Elisa.xml
46 69 6e 6c 61 6e 64 2e 4a e4 72 76 65 6e 70 e4 e4 2d 45 6c 69 73 61 2e 78 6d 6c
Zip of the two files attached: F_test.zip
Thanks for the details and zip file. It helped a lot during testing. Please pull latest docker image it should work now.
I replaced print(filename)
to print(repr(filename))
. With this change, docker prints the following:
Perfect match : '/mnt/f/Finland.Järvenpää-Elisa.xml' '/mnt/f/a'
+----------------+---------------------------------------------+---------------+
| Chunk Size(KB) | Files | Duplicate(KB) |
+----------------+---------------------------------------------+---------------+
| 256 | /mnt/f/Finland.Järvenpää-Elisa.xml:/mnt/f/a | 51200 |
+----------------+---------------------------------------------+---------------+
ps: this fix is only part of docker image, need to add it repo master branch.
Thanks, works with the latest docker image.
thanks @plattrap for the confirmation. I'll go ahead and mark this as resolved. Please report any issues if you encounter.
Backed up an old Windows disk onto a BTRFS backed network share. Now
dduper
throws an exception on one of the filenames.ls
gives the filename as:'Finland.J'$'\344''rvenp'$'\344\344''-Elisa.xml'
Using the docker image.
sudo docker run -it --device /dev/sdc -v /media/backup/:/mnt laks/dduper dduper --device /dev/sda1 --dir /mnt --analyze --recurse