brendonh / pyth

Python text markup and conversion
MIT License
89 stars 79 forks source link

rft control word "\f0" not reconized #2

Closed joka closed 14 years ago

joka commented 14 years ago

Im using rtf files generated by pandoc. They have a lot of "\f0" control words (no idea why).

/plugins/rtf15/reader.py cannot read these files because of this "\f0" word.

For a general solution, could you skip unknown control words?

Example rtf: {\rtf\ansi\deff0{\fonttbl{\f0\froman Tms Rmn;}{\f1\fdecor Symbol;}{\f2\fswiss Helv;}}{\colortbl;\red0\green0\blue0; \red0\green0\blue255;\red0\green255\blue255;\red0\green255\ blue0;\red255\green0\blue255;\red255\green0\blue0;\red255\ green255\blue0;\red255\green255\blue255;}{\stylesheet{\fs20 \snext0Normal;}}{\info{\author John Doe} {\creatim\yr1990\mo7\dy30\hr10\min48}{\version1}{\edmins0} {\nofpages1}{\nofwords0}{\nofchars0}{\vern8351}}\widoctrl\ftnbj \sectd\linex0\endnhere \pard\plain \fs20 This is plain text.\

brendonh commented 14 years ago

Hi Joka. Sorry for the slow response, I'm away from home.

\f0 is a standard control word that the RTF reader normally handles. From your example, I'm not sure what problem it's having.

I don't see a way to attach files here, so could you email your whole RTF file to me at brendonh@gmail.com ? I'll figure out what's tripping it up.

Cheers, Brendon

joka commented 14 years ago

ok fine, and thank you for pyth, it's really nice to have an pythonic rtf reader.

brendonh commented 14 years ago

I think I've fixed this (in trunk). Pyth was ignoring font declarations that didn't have a \fcharset. Now they default to the reader's charset (e.g. from the initial \ansi) instead, which I think is the right thing to do -- the spec isn't clear.

It seems to work for your example doc, anyway.