bbottema / outlook-message-parser

A Java parser for Outlook messages (.msg files)
77 stars 35 forks source link

SimpleRTF2HTMLConverter inserts too many <br/> tags #11

Closed fadeyev closed 5 years ago

fadeyev commented 5 years ago

Not sure what is the purpose of the line 118 in SimpleRTF2HTMLConverter#fetchHtmlSection() : html = html.replaceAll("[\\n\\r]+", " <br/> "); However this results in whole lot of extra <br/> tags. And when trying to send an email with such HTML it looks awful with lots of extra lines. However when I replaced <br/> back with a newline \n and sent the email, it looked just like the original. I tried this on about 10 different emails of various complexity and this replacement of newline with <br/> broke all of them completely, while removing this line fixed them to be just like the originals.

bbottema commented 5 years ago

It's been a while since I looked at the RTF spec, but aren't newlines in RTF encoded with \n\r? That would mean they should be HTML newlines (br's) as well. I see this was in the original sources as well.

/edit Removing that line doesn't seem to cause issues for me in the unit test, but I remember having formatting issues with a chinese email... can't recall the details though, I think I'll remove it since you have more evidence to the contrary.

bbottema commented 5 years ago

Released v1.3.0