Open log2akshat opened 3 years ago
Unfortunately, putting ZZZ in there makes that not an HTML comment.
The lexer should recognize that <style>
content is raw text so you should be able to use a pre-processor to modify the text based on the not-quite-HTML dialect that you're dealing with.
Hi,
We are using this library in Zimbra for sanitization of the e-mail body and during sanitization of the customer-generated HTML, we came across the following situation when we have a text in the HTML comment then during sanitization it is not able to parse properly.
After removing the text
ZZZ
from the comment the content of the e-mail body is displayed properly. As it is treating the comment as a nested tag inside the style tag and searching for the closing of the nested tag</
and when it finds</style>
it considers it as the closing of the nested tag<ZZZ>
and leaving the other tags unbalanced.I have also gone through the code of the HtmlLexer but not able to figure it out since the parsing is happening based on one character at a time so regex pattern matching will not be possible to handle this issue.
It will be great if someone can guide me on how to handle this situation or it can be considered as an enhancement or bugfix.