carlbeech / fast-duplicate-finder

A python program to locate duplicate files - and do it fast
GNU Lesser General Public License v3.0
9 stars 3 forks source link

Do not crash when character cannot be encoded #10

Open koppor opened 4 years ago

koppor commented 4 years ago

 FileCount 5596
Total files:497153
 =======================================  Matching files 497151
Traceback (most recent call last):
  File "fdf_scanner.py", line 1188, in <module>
    GenerateOutput()
  File "fdf_scanner.py", line 564, in GenerateOutput
    OutFile.write('\n'+OutputFileRemark+'    (' + FileDB[i][4] + ') ' + FileDB[i][1] + ' SAVE Size:' + str(FileDB[i][3]))
  File "C:\Python38\lib\encodings\cp1252.py", line 19, in encode
    return codecs.charmap_encode(input,self.errors,encoding_table)[0]
UnicodeEncodeError: 'charmap' codec can't encode character '\u0191' in position 151: character maps to <undefined>

Fortunately, the .bat file was written first. (Without any issue)

Cycor commented 3 years ago

I am having almost the same issue on linux:

 FileCount 23814
Total files:36240
Sorting...
Matching...
 =                                        Matching files 1588 saving...                                                                                 Traceback (most recent call last):
  File "fdf_scanner.py", line 1300, in <module>
  File "fdf_scanner.py", line 564, in CalculateHashes
  File "fdf_scanner.py", line 713, in SaveDatabase
  File "src/lxml/etree.pyx", line 1024, in lxml.etree._Element.text.__set__
  File "src/lxml/apihelpers.pxi", line 747, in lxml.etree._setNodeText
  File "src/lxml/apihelpers.pxi", line 735, in lxml.etree._createTextNode
  File "src/lxml/apihelpers.pxi", line 1532, in lxml.etree._utf8
UnicodeEncodeError: 'utf-8' codec can't encode characters in position 64-66: surrogates not allowed
[6488] Failed to execute script fdf_scanner
koppor commented 3 years ago

Meanwhile, I found fdupes, which is maintained since 1999. Tried to start writing my own one (https://github.com/koppor/kodf), but not yet fully succeded.

carlbeech commented 3 years ago

Hi I thought I'd got this one - I'll have a re-look (I had another app with a similar fault, I'll have a look and see what the differences are and backport...) Thanks Carl

carlbeech commented 3 years ago

Cycor / Oliver, are you able to run from sourcecode, or do you need a binary build..? I've uploaded 0.9 test version with extra checks around filenames - if you could try this out and let me know if this improves the situation?

Many thanks

Carl.

Cycor commented 3 years ago

I build the latest source and that fixed it, thanks!

btw. I was using the precompiled v0.8 release because I had some trouble figuring out all the required QT packages needed too run the console version.