cwoac / thingy_grabber

Script for archiving thingiverse things
MIT License
39 stars 11 forks source link

Handle unicode names in filenames better #7

Closed cwoac closed 4 years ago

cwoac commented 4 years ago

We currently use slugify to remove invalid (unicode) characters from filenames. In principle this could lead to an issue where a thing with multiple unicode named files might slug to the same name.

joebywan commented 4 years ago

Another instance of the issue (after the last fix it didn't error on some it had before)

D:\waste\thingygrabber>(thingy_grabber.py -d .\stls\ user crex37 ) Target directory .\stls\crex37 designs already exists. Assuming a resume. Downloading 4 thing(s). Downloading thing 0 - 4184826 Old-style download directory found. Assuming update required. Old style download dir found at Akhilleus-??fa Pattern Ground Plunderer Copying 0 unchanged files. Downloading 10 new files of 10 Downloading 7 images. Downloading license Traceback (most recent call last): File "D:\waste\thingygrabber\thingy_grabber.py", line 666, in main() File "D:\waste\thingygrabber\thingy_grabber.py", line 654, in main Designs(user, args.directory, args.quick).download() File "D:\waste\thingygrabber\thingy_grabber.py", line 248, in download RC = Thing(thing).download(self.download_dir) File "D:\waste\thingygrabber\thingy_grabber.py", line 537, in download license_handle.write("{}\n".format(self._license)) File "C:\Program Files\Python38\lib\encodings\cp1252.py", line 19, in encode return codecs.charmap_encode(input,self.errors,encoding_table)[0] UnicodeEncodeError: 'charmap' codec can't encode characters in position 10-13: c haracter maps to

cwoac commented 4 years ago

Moved your comment to a new issue as that is not what this one is about.