apprenticeharper / DeDRM_tools

DeDRM tools for ebooks
14.55k stars 1.52k forks source link

Some unicode title characters cause error. #738

Open jhaisley opened 5 years ago

jhaisley commented 5 years ago

DeDRM.app/Contents/Resources/k4mobidedrm.py", line 272, in decryptBook outfilename = '{}_{}'.format(orig_fn_root, clean_title) UnicodeEncodeError: 'ascii' codec can't encode character u'\u2013' in position 11: ordinal not in range(128)

As a work around I modified k4mobidedrm.py from clean_title = cleanup_name(book.getBookTitle()) to clean_title = "CleanBook" This worked of course, then I manually renamed the file.

A better fix would be adding:

delete extended characters

name = u"".join(char for char in name if ord(char)<=126)

to cleanup_name() since extended characters are likely to cause various issues.

apprenticeharper commented 5 years ago

I'd rather have cleanup_name handling Unicode properly than just strip non-ascii characters.

qpwo commented 5 years ago

Exact same error happened to me. I think

outfilename = u'{}_{}'.format(orig_fn_root, clean_title.decode('utf8'))

fixes it and keeps the unicode. Minimal example:

>>> filename = u"{}_{}".format(a, b.decode('utf8'))
>>> a = "dog"
>>> b = "contains – unicode"
>>> filename = u"{}_{}".format(a, b.decode('utf8'))
>>> with open(filename, 'w') as f: f.write("hello")

The file looks good in my file browser.