Closed watercrossing closed 10 years ago
This looks pretty great. I'm totally okay with having images in the intermediate documents. I'm not quite so happy about having them contain rtf-specific properties, but I also don't see how it could do much better without a huge amount of work, so it's fine.
Can you add a new sample file with an image in it for demonstration?
I agree with you, it would be certainly preferential to drop the rtf specific instructions - but that would require thinking of something else which could handle the data. I couldn't find a python library that would abstract the image data away neatly, so I think its best left at this stage for now.
I have added a sample file, and a small script along the lines of the previous version demonstrating the behaviour.
One other point: Many text editors (LibreOffice, MS Word for example) save images in both the native format (png, jpg, emf, Quickdraw) and an uncompressed version as a Windows metafile, because WordPad (and others) can only read Windows metafiles. The test .rtf I added also has both versions. This explains the bloated file size - the original png is 11KB, the uncompressed metafile about 2MB. This pull request will just return both of images one after the other - so that the user can choose which one is wanted.
Okay, works for me!
When currently opening a rtf document which contains an image, the image is parsed as a Paragraph, with the bytestring of the image as the Text. This makes filtering out images cumbersome - one has to filter out Texts based on their string lengths, and the representation is being messed up too.
This pull requests adds basic support for the images: A new Image class has been created, basically analogues to a text class, but it stores all relevant image metadata as defined in the rtf specifications. The parsers has been extended to fill the Image class appropriately.
I do realise that this project focuses primarily on "marked up text", so an alternative approach would be to drop images entirely, instead of putting them in a new image class.