Closed joka closed 14 years ago
This is a legitimate bug but I'm not sure what the correct fix is. It's possible to construct a surrogate pair to represent the character, with something like:
struct.pack('<L', 0x10000).decode('utf-32')
Which will "work", but perhaps cause other bugs down the line with e.g. slicing. So maybe we shouldn't do it.
Other alternatives include replacing it with '?', or just raising a different exception type. I haven't decided.
I like the solution: replacing with ? + log message
I've tried the struct trick above. It means that plugins should never trust len() of unicode strings, or slice them. But that's probably true anyway.
I have a rtf file with strange unicode strings (send you an email).
This causes rtf reader to throw ValueError:
The reason why is, my python was build without support for "wide" Unicode characters. (http://www.python.org/dev/peps/pep-0261/). However, an exception handling would be nice.