Open fadeyev opened 4 years ago
Perfect. The change is probably actually on bbottema/rtf-to-html.
Ah, my bad, sorry - you can move the request to that project if you like.
It's fine like this, no problem.
I've had a talk with @kschroeer and he is willing to have his code merge with this code base into one cohesive solution. He did stress that he wants to make sure the solution is not tied to any other libraries to keep it as light-weight as possible, something I totally agree with.
Swing could be an optional dependency if people really would like to play with that option and I myself like to keep the option available for completeness sake.
Finally the result should be as you state in your opening: take kschroeer/rtf-html-java as a base, add the specifics of the RFC compliant converter, while defining defaults for non RTF-HTML elements.
When viewing these two rtf mails
https://github.com/Sicos1977/MSGReader/blob/master/MsgReaderTests/SampleFiles/RtfSampleEmail.msg https://github.com/Sicos1977/MSGReader/blob/master/MsgReaderTests/SampleFiles/RtfSampleEmailWithAttachment.msg
I get the following as the textHTML (screenshot from the second one as the first contains way too much text):
Is this related to this enhancement or a separate issue?
As disscussed in https://github.com/bbottema/outlook-message-parser/pull/15 there are Outlook msg files that have only RTF body, which were created from RTF directly, not from HTML (you can create such email in Outlook by selecting FORMAT TEXT tab -> Format section -> Rich Text when creating a new message). Current parser doesn't parse such emails even closely to something readable.
To support this we need a generic RTF parser, which can parse generic RTF file and then convert it to HTML. It should handle handle all RTF formatting like
\pard\plain \f0\b
and convert it to HTML tags (like<div>
,<span>
, etc.) and style attributes (likefont-size
,font-family
, etc.) Probably we can combine current parser and generic one written by kschroeer/rtf-html-java.