brendonh / pyth

Python text markup and conversion
MIT License
89 stars 79 forks source link

rtf reader: decode argument TypeError #7

Closed joka closed 14 years ago

joka commented 14 years ago

send you a test file

Module pyth.plugins.rtf15.reader, line 103, in read
Module pyth.plugins.rtf15.reader, line 124, in go
Module pyth.plugins.rtf15.reader, line 155, in parse 
Module pyth.plugins.rtf15.reader, line 385, in char
TypeError: decode() argument 1 must be string, not dict
brendonh commented 14 years ago

So far as I can tell, the RTF file you sent me for this is invalid: It clearly defines \f3 as a Symbol font, and then tries to use it for "Normal Table", which ain't valid symbol codepoints.

That said, I've attempted three different fixes:

  1. Symbol is now implemented as a custom codec, rather than a dict and typecheck tricks in the reader.
  2. You can pass an "errors" argument to the reader, values "replace" or "strict", and it will pass it along to its codecs, including Symbol (and also use it for non-BMP unicode chars in narrow Pythons)
  3. Inside an ignored group and its subgroups, everything is now totally ignored, so that we can't crash on things we don't handle inside groups we don't care about. This "fixes" your sample file.
joka commented 14 years ago

Cool, that sounds good. But it`s a shame that Word 2003 produces invalid rtf files some how.