brendonh / pyth

Python text markup and conversion
MIT License
89 stars 79 forks source link

small fix for a crash on some type of RTFs #8

Closed ofri closed 10 years ago

ofri commented 13 years ago

Some RTFs contain fontNum=0 without declaring it, which makes the RtfReader fail. I didn't really read the RTF specs, so accept my appologies in advance if this bug in the RTFs is illegal and should actually fail the parsing. I found many of these RTFs in the Israeli Knesset website which I use pyth to read.

brendonh commented 13 years ago

Man, how did I miss this for so long? Sorry.

Is ignoring missing fonts really the right behaviour here? I'm wondering if it should fall back to cp1252 instead.

What sort of text is in fontNum=0 blocks in your RTFs?

ofri commented 13 years ago

I can't seem to reproduce this now, using 0.5.6. Could it be that some other change fixed it? fixes in reading charset table maybe?

My intension was not to ignore the text in the missing font, but to ignore the font switch. I assumed that if I don't execute the self.charset assignment, i'll just be using some default/fallback charset.

brendonh commented 13 years ago

You'd be using whatever charset the containing block was in, while I'm guessing fontNum=0 should indeed be the default charset, cp1252.

Have you tested it on the same files you were having problems with before? If so, I'll close this.