Sicos1977 / MSGReader

C# Outlook MSG file reader without the need for Outlook
http://sicos1977.github.io/MSGReader
MIT License
475 stars 168 forks source link

Fix HtmlToText stack overflow error when converting HTML comments #394

Closed IagoGrah closed 7 months ago

IagoGrah commented 7 months ago

Hello,

A few months ago while working I have found a problem with the HtmlToText class. I happened to get multiple stack overflow situations while trying to read some MSG files. Those files were very large, but most importantly had a big HTML comment right at the start.

The reader didn't quite find the end of the comment tag () as it was never marked as self closing due to not having a '/' when closing. As the reading went on due to the large file size it resulted in a stack overflow exception looping the EatInnerContent method.

Unfortunately I can't show the original files due to them having sensitive information, and I didn't bother making another large enough file that also overflowed. But I did realize that smaller files, while not overflowing, were not converted, and returned empty text.

I tested a bunch of situations, possible comment formats, with and without my change. Basically without my change any version of a comment would result in the rest of the HTML not being read. Tested with the stack overflow huge files and the whole text content was returned with no problem.

I'm attaching below the console app I used to test with an example txt file with a comment. It uses a copy of the class, just for convenience, feel free to open it and run it, debug it. MsgReaderStackOverflow.zip

Not claiming it's the best solution but it helped my work greatly 🤝 If you need more info, I'll do my best.