OWASP / java-html-sanitizer

Takes third-party HTML and produces HTML that is safe to embed in your web application. Fast and easy to configure.
Other
843 stars 213 forks source link

The sanitizer is removing HTML when the tag depth is more than 256 sanitizer removes the tags which actually has the content #205

Open rasmitam opened 4 years ago

rasmitam commented 4 years ago

I have a HTML which is surrounded by a big list of empty <div> tags, sanitizer is removing a portion of a HTML. When removed the empty list of <div> tags from the HTML the sanitizer did not strip off the portion of the HTML. Please find the HTML where the issue is seen. Please suggest what can be done for this issue.

<div dir="rtl"><div class="gmail_quote"><div dir="rtl"><div class="gmail_quote"><div dir="rtl"><div class="gmail_quote"><div dir="rtl"><div class="gmail_quote"><div dir="rtl"><div class="gmail_quote"><div dir="rtl"><div class="gmail_quote"><div dir="rtl"><div class="gmail_quote"><div dir="rtl"><div class="gmail_quote"><div dir="rtl"><div class="gmail_quote"><div dir="rtl"><div class="gmail_quote"><div dir="rtl"><div class="gmail_quote"><div dir="rtl"><div class="gmail_quote"><div dir="rtl"><div class="gmail_quote"><div dir="rtl"><div class="gmail_quote"><div dir="rtl"><div class="gmail_quote"><div dir="rtl"><div class="gmail_quote"><div dir="rtl"><div class="gmail_quote"><div dir="rtl"><div class="gmail_quote"><div dir="rtl"><div class="gmail_quote"><div dir="rtl"><div class="gmail_quote"><div dir="rtl"><div class="gmail_quote"><div dir="rtl"><div class="gmail_quote"><div dir="rtl"><div class="gmail_quote"><div dir="rtl"><div class="gmail_quote"><div dir="rtl"><div class="gmail_quote"><div dir="rtl"><div class="gmail_quote"><div dir="rtl"><div class="gmail_quote"><div dir="rtl"><div class="gmail_quote"><div dir="rtl"><div class="gmail_quote"><div dir="rtl"><div class="gmail_quote"><div dir="rtl"><div class="gmail_quote"><div dir="rtl"><div class="gmail_quote"><div dir="rtl"><div class="gmail_quote"><div dir="rtl"><div class="gmail_quote"><div dir="rtl"><div class="gmail_quote"><div dir="rtl"><div class="gmail_quote"><div dir="rtl"><div class="gmail_quote"><div dir="rtl"><div class="gmail_quote"><div dir="rtl"><div class="gmail_quote"><div dir="rtl"><div class="gmail_quote"><div dir="rtl"><div class="gmail_quote"><div dir="rtl"><div class="gmail_quote"><div dir="rtl"><div class="gmail_quote"><div dir="rtl"><div class="gmail_quote"><div dir="rtl"><div class="gmail_quote"><div dir="rtl"><div class="gmail_quote"><div dir="rtl"><div class="gmail_quote"><div dir="rtl"><div class="gmail_quote"><div dir="rtl"><div class="gmail_quote"><div dir="rtl"><div class="gmail_quote"><div dir="rtl"><div class="gmail_quote"><div dir="rtl"><div class="gmail_quote"><div dir="rtl"><div class="gmail_quote"><div dir="rtl"><div class="gmail_quote"><div dir="rtl"><div class="gmail_quote"><div dir="rtl"><div class="gmail_quote"><div dir="rtl"><div class="gmail_quote"><div dir="rtl"><div class="gmail_quote"><div dir="rtl"><div class="gmail_quote"><div dir="rtl"><div class="gmail_quote"><div dir="rtl"><div class="gmail_quote"><div dir="rtl"><div class="gmail_quote"><div dir="rtl"><div class="gmail_quote"><div dir="rtl"><div class="gmail_quote"><div dir="rtl"><div class="gmail_quote"><div dir="rtl"><div class="gmail_quote"><div dir="rtl"><div class="gmail_quote"><div dir="rtl"><div class="gmail_quote"><div dir="rtl"><div class="gmail_quote"><div dir="rtl"><div class="gmail_quote"><div dir="rtl"><div class="gmail_quote"><div dir="rtl"><div class="gmail_quote"><div dir="rtl"><div class="gmail_quote"><div dir="rtl"><div class="gmail_quote"><div dir="rtl"><div class="gmail_quote"><div dir="rtl"><div class="gmail_quote"><div dir="rtl"><div class="gmail_quote"><div dir="rtl"><div class="gmail_quote"><div dir="rtl"><div class="gmail_quote"><div dir="rtl"><div class="gmail_quote"><div dir="rtl"><div class="gmail_quote"><div dir="rtl"><div class="gmail_quote"><div dir="rtl"><div class="gmail_quote"><div dir="rtl"><div class="gmail_quote"><div dir="rtl"><div class="gmail_quote"><div dir="rtl"><div class="gmail_quote"><div dir="rtl"><div class="gmail_quote"><div dir="rtl"><div class="gmail_quote"><div dir="rtl"><div class="gmail_quote"><div dir="rtl"><div class="gmail_quote"><div dir="rtl"><div class="gmail_quote"><div dir="rtl"><div class="gmail_quote"><div dir="rtl"><div class="gmail_quote"><div dir="rtl"><div class="gmail_quote"><div dir="rtl"><div class="gmail_quote"><div dir="rtl"><div class="gmail_quote"><div dir="rtl"><div class="gmail_quote"><div dir="rtl"><div class="gmail_quote"><div dir="rtl"><div class="gmail_quote"><div dir="rtl"><div class="gmail_quote"><div dir="rtl"><div class="gmail_quote"><div dir="rtl"><div class="gmail_quote"><div dir="rtl"><div class="gmail_quote"><div dir="rtl"><div class="gmail_quote"><div dir="rtl"><div class="gmail_quote"><div dir="rtl"><div class="gmail_quote"><div dir="rtl"><div class="gmail_quote"><div dir="rtl"><div class="gmail_quote"><div dir="rtl"><div class="gmail_quote"><div dir="rtl"><div dir="rtl"><div class="gmail_quote"><div class="gmail_attr" style="text-align:right" dir="ltr"><b><span lang="HE" style="font-family:&quot;David&quot;,&quot;sans-serif&quot;;font-size:14pt">,חבר יקר</span></b></div><div dir="rtl"><div dir="rtl"><div class="gmail_quote" dir="rtl"><div class="gmail_quote" dir="rtl"><div class="gmail_quote" dir="rtl"><div class="gmail_quote" dir="rtl"><div class="gmail_quote" dir="rtl"><div class="gmail_quote" dir="rtl"><div class="gmail_quote" dir="rtl"><div class="gmail_quote" dir="rtl"><div class="gmail_quote" dir="rtl"><div class="gmail_quote" dir="rtl"><div class="gmail_quote" dir="rtl"><div class="gmail_quote" dir="rtl"><div class="gmail_quote" dir="rtl"><div lang="EN-US">Rasmita01<div><div><div><div>Rasmita02<div><div><div><div><div>Rasmita03<div>Rasmita04<div><div><div><div>Rasmita05<div><div><div><div>Rasmita06<div>Rasmita07<div>Rasmita08<div>Rasmita09<div>Rasmita010<div><div><div><div><div><div>Rasmita011<div><div>Rasmita012<div><div><div>Rasmita00<div><div><div><div><div><div><div><div><div><div><div><div><div><div><div><div><div><div><div><div><div><div><div><div>Rasmita000<div>Rasmita0000<div><div><div>Rasmita1 \r\n </div><div></div>Testing Sanitizer is removing divs when the number of open tags are very high<div></div><div><p class="MsoNormal" style="text-align:right;unicode-bidi:embed;direction:rtl" dir="RTL"><span lang="HE"> <u></u><u></u></span></p></div><div></div><div></div><div></div><div></div></div><div></div><div></div><div></div></div><div></div><div></div><div></div></div><div></div><div></div><div></div></div><div></div><div></div><div></div></div><div></div><div></div><div></div></div><div></div><div></div><div></div></div><div></div><div></div><div></div></div><div></div><div></div><div></div></div><div></div><div></div><div></div></div><div></div><div></div><div></div></div><div></div><div></div><div></div></div><div></div><div></div><div></div></div><div></div><div></div><div></div></div><div></div><div></div><div></div></div><div></div><div></div><div></div></div><div></div><div></div><div></div></div><div></div><div></div><div></div></div><div></div><div></div><div></div></div><div></div><div></div><div></div></div><div></div><div></div><div></div></div><div></div><div></div><div></div></div><div></div><div></div><div></div></div><div></div><div></div><div></div></div><div></div><div></div><div></div></div><div></div><div></div><div></div></div><div></div><div></div><div></div></div><div></div><div></div><div></div></div><div></div><div></div><div></div></div><div></div><div></div><div></div></div><div></div><div></div><div></div></div><div></div><div></div><div></div></div><div></div><div></div><div></div></div><div></div><div></div><div></div></div><div></div><div></div><div></div></div><div></div><div></div><div></div></div><div></div><div></div><div></div></div><div></div><div></div><div></div></div><div></div><div></div><div></div></div><div></div><div></div><div></div></div><div></div><div></div><div></div></div><div></div><div></div><div></div></div><div></div><div></div><div></div></div><div></div><div></div><div></div></div><div></div><div></div><div></div></div><div></div><div></div><div></div></div><div></div><div></div><div></div></div><div></div><div></div><div></div></div><div></div><div></div><div></div></div><div></div><div></div><div></div></div><div></div><div></div><div></div></div><div></div><div></div><div></div></div><div></div><div></div><div></div></div><div></div><div></div><div></div></div><div></div><div></div><div></div></div><div></div><div></div><div></div></div><div></div><div></div><div></div></div><div></div><div></div><div></div></div><div></div><div></div><div></div></div><div></div><div></div><div></div></div><div></div><div></div><div></div></div><div lang="EN-US"></div><div lang="EN-US"></div><div lang="EN-US"></div></div><div class="gmail_quote" dir="rtl"></div></div><div class="gmail_quote" dir="rtl"></div></div><div class="gmail_quote" dir="rtl"></div></div><div class="gmail_quote" dir="rtl"></div></div><div class="gmail_quote" dir="rtl"></div></div><div class="gmail_quote" dir="rtl"></div></div><div class="gmail_quote" dir="rtl"></div></div><div class="gmail_quote" dir="rtl"></div></div><div class="gmail_quote" dir="rtl"></div></div><div class="gmail_quote" dir="rtl"></div></div><div class="gmail_quote" dir="rtl"></div></div><div class="gmail_quote" dir="rtl"></div></div><div class="gmail_quote"></div></div></div></div></div></div></div></div></div></div></div></div></div></div></div></div></div></div></div></div></div></div></div></div></div></div></div></div></div></div></div></div></div></div></div></div></div></div></div></div></div></div></div></div></div></div></div></div></div></div></div></div></div></div></div></div></div></div></div></div></div></div></div></div></div></div></div></div></div></div></div></div></div></div></div></div></div></div></div></div></div></div></div></div></div></div></div></div></div></div></div></div></div></div></div></div></div></div></div></div></div></div></div></div></div></div></div></div></div></div></div></div></div></div></div></div></div></div></div></div></div></div></div></div></div></div></div></div></div></div></div></div></div></div></div></div></div></div></div></div></div></div></div></div></div></div></div></div></div></div></div></div></div></div></div></div></div></div></div></div></div></div></div></div></div></div></div></div></div></div></div></div></div></div></div></div></div></div></div></div></div></div></div></div></div></div></div></div></div></div></div></div></div></div></div></div></div></div></div></div></div></div></div></div></div></div></div></div></div></div></div></div></div></div></div></div></div></div></div></div></div></div></div></div> 

Post sanitization

<div dir="rtl"><div class="gmail_quote"><div dir="rtl"><div class="gmail_quote"><div dir="rtl"><div class="gmail_quote"><div dir="rtl"><div class="gmail_quote"><div dir="rtl"><div class="gmail_quote"><div dir="rtl"><div class="gmail_quote"><div dir="rtl"><div class="gmail_quote"><div dir="rtl"><div class="gmail_quote"><div dir="rtl"><div class="gmail_quote"><div dir="rtl"><div class="gmail_quote"><div dir="rtl"><div class="gmail_quote"><div dir="rtl"><div class="gmail_quote"><div dir="rtl"><div class="gmail_quote"><div dir="rtl"><div class="gmail_quote"><div dir="rtl"><div class="gmail_quote"><div dir="rtl"><div class="gmail_quote"><div dir="rtl"><div class="gmail_quote"><div dir="rtl"><div class="gmail_quote"><div dir="rtl"><div class="gmail_quote"><div dir="rtl"><div class="gmail_quote"><div dir="rtl"><div class="gmail_quote"><div dir="rtl"><div class="gmail_quote"><div dir="rtl"><div class="gmail_quote"><div dir="rtl"><div class="gmail_quote"><div dir="rtl"><div class="gmail_quote"><div dir="rtl"><div class="gmail_quote"><div dir="rtl"><div class="gmail_quote"><div dir="rtl"><div class="gmail_quote"><div dir="rtl"><div class="gmail_quote"><div dir="rtl"><div class="gmail_quote"><div dir="rtl"><div class="gmail_quote"><div dir="rtl"><div class="gmail_quote"><div dir="rtl"><div class="gmail_quote"><div dir="rtl"><div class="gmail_quote"><div dir="rtl"><div class="gmail_quote"><div dir="rtl"><div class="gmail_quote"><div dir="rtl"><div class="gmail_quote"><div dir="rtl"><div class="gmail_quote"><div dir="rtl"><div class="gmail_quote"><div dir="rtl"><div class="gmail_quote"><div dir="rtl"><div class="gmail_quote"><div dir="rtl"><div class="gmail_quote"><div dir="rtl"><div class="gmail_quote"><div dir="rtl"><div class="gmail_quote"><div dir="rtl"><div class="gmail_quote"><div dir="rtl"><div class="gmail_quote"><div dir="rtl"><div class="gmail_quote"><div dir="rtl"><div class="gmail_quote"><div dir="rtl"><div class="gmail_quote"><div dir="rtl"><div class="gmail_quote"><div dir="rtl"><div class="gmail_quote"><div dir="rtl"><div class="gmail_quote"><div dir="rtl"><div class="gmail_quote"><div dir="rtl"><div class="gmail_quote"><div dir="rtl"><div class="gmail_quote"><div dir="rtl"><div class="gmail_quote"><div dir="rtl"><div class="gmail_quote"><div dir="rtl"><div class="gmail_quote"><div dir="rtl"><div class="gmail_quote"><div dir="rtl"><div class="gmail_quote"><div dir="rtl"><div class="gmail_quote"><div dir="rtl"><div class="gmail_quote"><div dir="rtl"><div class="gmail_quote"><div dir="rtl"><div class="gmail_quote"><div dir="rtl"><div class="gmail_quote"><div dir="rtl"><div class="gmail_quote"><div dir="rtl"><div class="gmail_quote"><div dir="rtl"><div class="gmail_quote"><div dir="rtl"><div class="gmail_quote"><div dir="rtl"><div class="gmail_quote"><div dir="rtl"><div class="gmail_quote"><div dir="rtl"><div class="gmail_quote"><div dir="rtl"><div class="gmail_quote"><div dir="rtl"><div class="gmail_quote"><div dir="rtl"><div class="gmail_quote"><div dir="rtl"><div class="gmail_quote"><div dir="rtl"><div class="gmail_quote"><div dir="rtl"><div class="gmail_quote"><div dir="rtl"><div class="gmail_quote"><div dir="rtl"><div class="gmail_quote"><div dir="rtl"><div class="gmail_quote"><div dir="rtl"><div class="gmail_quote"><div dir="rtl"><div class="gmail_quote"><div dir="rtl"><div class="gmail_quote"><div dir="rtl"><div class="gmail_quote"><div dir="rtl"><div class="gmail_quote"><div dir="rtl"><div class="gmail_quote"><div dir="rtl"><div class="gmail_quote"><div dir="rtl"><div class="gmail_quote"><div dir="rtl"><div class="gmail_quote"><div dir="rtl"><div class="gmail_quote"><div dir="rtl"><div class="gmail_quote"><div dir="rtl"><div class="gmail_quote"><div dir="rtl"><div class="gmail_quote"><div dir="rtl"><div class="gmail_quote"><div dir="rtl"><div class="gmail_quote"><div dir="rtl"><div class="gmail_quote"><div dir="rtl"><div class="gmail_quote"><div dir="rtl"><div class="gmail_quote"><div dir="rtl"><div class="gmail_quote"><div dir="rtl"><div class="gmail_quote"><div dir="rtl"><div class="gmail_quote"><div dir="rtl"><div class="gmail_quote"><div dir="rtl"><div class="gmail_quote"><div dir="rtl"><div class="gmail_quote"><div dir="rtl"><div class="gmail_quote"><div dir="rtl"><div class="gmail_quote"><div dir="rtl"><div class="gmail_quote"><div dir="rtl"><div class="gmail_quote"><div dir="rtl"><div dir="rtl"><div class="gmail_quote"><div class="gmail_attr" style="text-align:right" dir="ltr"><b><span lang="HE" style="font-family:&#39;david&#39; , &#39;sans-serif&#39;;font-size:14pt">,חבר יקר</span></b></div><div dir="rtl"><div dir="rtl"><div class="gmail_quote" dir="rtl"><div class="gmail_quote" dir="rtl"><div class="gmail_quote" dir="rtl"><div class="gmail_quote" dir="rtl"><div class="gmail_quote" dir="rtl"><div class="gmail_quote" dir="rtl"><div class="gmail_quote" dir="rtl"><div class="gmail_quote" dir="rtl"><div class="gmail_quote" dir="rtl"><div class="gmail_quote" dir="rtl"><div class="gmail_quote" dir="rtl"><div class="gmail_quote" dir="rtl"><div class="gmail_quote" dir="rtl"><div lang="EN-US">Rasmita01<div><div><div><div>Rasmita02<div><div><div><div><div>Rasmita03<div>Rasmita04<div><div><div><div>Rasmita05<div><div><div><div>Rasmita06<div><div><div><div></div><div></div><div></div><div></div></div><div></div><div></div><div></div></div><div></div><div></div><div></div></div><div></div><div></div><div></div></div><div></div><div></div><div></div></div><div></div><div></div><div></div></div><div></div><div></div><div></div></div><div></div><div></div><div></div></div><div></div><div></div><div></div></div><div></div><div></div><div></div></div><div></div><div></div><div></div></div><div></div><div></div><div></div></div><div></div><div></div><div></div></div><div></div><div></div><div></div></div><div></div><div></div><div></div></div><div></div><div></div><div></div></div><div></div><div></div><div></div></div><div></div><div></div><div></div></div><div lang="EN-US"></div><div lang="EN-US"></div><div lang="EN-US"></div></div><div class="gmail_quote" dir="rtl"></div></div><div class="gmail_quote" dir="rtl"></div></div><div class="gmail_quote" dir="rtl"></div></div><div class="gmail_quote" dir="rtl"></div></div><div class="gmail_quote" dir="rtl"></div></div><div class="gmail_quote" dir="rtl"></div></div><div class="gmail_quote" dir="rtl"></div></div><div class="gmail_quote" dir="rtl"></div></div><div class="gmail_quote" dir="rtl"></div></div><div class="gmail_quote" dir="rtl"></div></div><div class="gmail_quote" dir="rtl"></div></div><div class="gmail_quote" dir="rtl"></div></div><div class="gmail_quote"></div></div></div></div></div></div></div></div></div></div></div></div></div></div></div></div></div></div></div></div></div></div></div></div></div></div></div></div></div></div></div></div></div></div></div></div></div></div></div></div></div></div></div></div></div></div></div></div></div></div></div></div></div></div></div></div></div></div></div></div></div></div></div></div></div></div></div></div></div></div></div></div></div></div></div></div></div></div></div></div></div></div></div></div></div></div></div></div></div></div></div></div></div></div></div></div></div></div></div></div></div></div></div></div></div></div></div></div></div></div></div></div></div></div></div></div></div></div></div></div></div></div></div></div></div></div></div></div></div></div></div></div></div></div></div></div></div></div></div></div></div></div></div></div></div></div></div></div></div></div></div></div></div></div></div></div></div></div></div></div></div></div></div></div></div></div></div></div></div></div></div></div></div></div></div></div></div></div></div></div></div></div></div></div></div></div></div></div></div></div></div></div></div></div></div></div></div></div></div></div></div></div></div></div></div></div></div></div></div></div></div></div></div></div></div></div></div></div></div></div></div></div></div></div></div></div></div></div> 

Post sanitization: After Rasmita06 all other divs are removed.

What I did next is removed the unwanted divs from the top and also removed the closing divs and see the results below.

<div lang="EN-US">Rasmita01<div><div><div><div>Rasmita02<div><div><div><div><div>Rasmita03<div>Rasmita04<div><div><div><div>Rasmita05<div><div><div><div>Rasmita06<div>Rasmita07<div>Rasmita08<div>Rasmita09<div>Rasmita010<div><div><div><div><div><div>Rasmita011<div><div>Rasmita012<div><div><div>Rasmita00<div><div><div><div><div><div><div><div><div><div><div><div><div><div><div><div><div><div><div><div><div><div><div><div>Rasmita000<div>Rasmita0000<div><div><div>Rasmita1 \r\n </div><div></div>Testing Sanitizer is removing divs when the number of open tags are very high<div></div><div><p class="MsoNormal" style="text-align:right;unicode-bidi:embed;direction:rtl" dir="RTL"><span lang="HE"> <u></u><u></u></span></p></div><div></div><div></div><div></div><div></div></div><div></div><div></div><div></div></div><div></div><div></div><div></div></div><div></div><div></div><div></div></div><div></div><div></div><div></div></div><div></div><div></div><div></div></div><div></div><div></div><div></div></div><div></div><div></div><div></div></div><div></div><div></div><div></div></div><div></div><div></div><div></div></div><div></div><div></div><div></div></div><div></div><div></div><div></div></div><div></div><div></div><div></div></div><div></div><div></div><div></div></div><div></div><div></div><div></div></div><div></div><div></div><div></div></div><div></div><div></div><div></div></div><div></div><div></div><div></div></div><div></div><div></div><div></div></div><div></div><div></div><div></div></div><div></div><div></div><div></div></div><div></div><div></div><div></div></div><div></div><div></div><div></div></div><div></div><div></div><div></div></div><div></div><div></div><div></div></div><div></div><div></div><div></div></div><div></div><div></div><div></div></div><div></div><div></div><div></div></div><div></div><div></div><div></div></div><div></div><div></div><div></div></div><div></div><div></div><div></div></div><div></div><div></div><div></div></div><div></div><div></div><div></div></div><div></div><div></div><div></div></div><div></div><div></div><div></div></div><div></div><div></div><div></div></div><div></div><div></div><div></div></div><div></div><div></div><div></div></div><div></div><div></div><div></div></div><div></div><div></div><div></div></div><div></div><div></div><div></div></div><div></div><div></div><div></div></div><div></div><div></div><div></div></div><div></div><div></div><div></div></div><div></div><div></div><div></div></div><div></div><div></div><div></div></div><div></div><div></div><div></div></div><div></div><div></div><div></div></div><div></div><div></div><div></div></div><div></div><div></div><div></div></div><div></div><div></div><div></div></div><div></div><div></div><div></div></div><div></div><div></div><div></div></div><div></div><div></div><div></div></div><div></div><div></div><div></div></div><div></div><div></div><div></div></div><div></div><div></div><div></div></div><div></div><div></div><div></div></div><div></div><div></div><div></div></div><div></div><div></div><div></div></div><div></div><div></div><div></div> 

post Sanitization:

<div lang="EN-US">Rasmita01<div><div><div><div>Rasmita02<div><div><div><div><div>Rasmita03<div>Rasmita04<div><div><div><div>Rasmita05<div><div><div><div>Rasmita06<div>Rasmita07<div>Rasmita08<div>Rasmita09<div>Rasmita010<div><div><div><div><div><div>Rasmita011<div><div>Rasmita012<div><div><div>Rasmita00<div><div><div><div><div><div><div><div><div><div><div><div><div><div><div><div><div><div><div><div><div><div><div><div>Rasmita000<div>Rasmita0000<div><div><div>Rasmita1 \r\n </div><div></div>Testing Sanitizer is removing divs when the number of open tags are very high<div></div><div><p class="MsoNormal" style="text-align:right;unicode-bidi:embed;direction:rtl" dir="RTL"><span lang="HE"> <u></u><u></u></span></p></div><div></div><div></div><div></div><div></div></div><div></div><div></div><div></div></div><div></div><div></div><div></div></div><div></div><div></div><div></div></div><div></div><div></div><div></div></div><div></div><div></div><div></div></div><div></div><div></div><div></div></div><div></div><div></div><div></div></div><div></div><div></div><div></div></div><div></div><div></div><div></div></div><div></div><div></div><div></div></div><div></div><div></div><div></div></div><div></div><div></div><div></div></div><div></div><div></div><div></div></div><div></div><div></div><div></div></div><div></div><div></div><div></div></div><div></div><div></div><div></div></div><div></div><div></div><div></div></div><div></div><div></div><div></div></div><div></div><div></div><div></div></div><div></div><div></div><div></div></div><div></div><div></div><div></div></div><div></div><div></div><div></div></div><div></div><div></div><div></div></div><div></div><div></div><div></div></div><div></div><div></div><div></div></div><div></div><div></div><div></div></div><div></div><div></div><div></div></div><div></div><div></div><div></div></div><div></div><div></div><div></div></div><div></div><div></div><div></div></div><div></div><div></div><div></div></div><div></div><div></div><div></div></div><div></div><div></div><div></div></div><div></div><div></div><div></div></div><div></div><div></div><div></div></div><div></div><div></div><div></div></div><div></div><div></div><div></div></div><div></div><div></div><div></div></div><div></div><div></div><div></div></div><div></div><div></div><div></div></div><div></div><div></div><div></div></div><div></div><div></div><div></div></div><div></div><div></div><div></div></div><div></div><div></div><div></div></div><div></div><div></div><div></div></div><div></div><div></div><div></div></div><div></div><div></div><div></div></div><div></div><div></div><div></div></div><div></div><div></div><div></div></div><div></div><div></div><div></div></div><div></div><div></div><div></div></div><div></div><div></div><div></div></div><div></div><div></div><div></div></div><div></div><div></div><div></div></div><div></div><div></div><div></div></div><div></div><div></div><div></div></div><div></div><div></div><div></div></div><div></div><div></div><div></div></div><div></div><div></div><div></div></div><div></div><div></div><div></div></div> 

See here post Rasmita06 all other divs are not removed.

Sample_html.txt Added the html causing issue to the bug.

mikesamuel commented 4 years ago

I think setNestingLimit is what you want.

jacobrshields commented 4 years ago

I think setNestingLimit is what you want.

Correct me if I'm wrong, but there doesn't seem to be a way to change this with the public API.

setNestingLimit is called on an instance of TagBalancingHtmlStreamEventReceiver that is constructed inside of the implementation of HtmlSanitizer#sanitize, so there is no way to access that TagBalancingHtmlStreamEventReceiver.

One could try to reimplement their own version of HtmlSanitizer#sanitize. However, that method relies on the HtmlLexer class, which is not public, so there is no way to access HtmlLexer without reflection.

mikesamuel commented 4 years ago

@jacobrshields IIRC, the lexer is invoked by Sanitizer.sanitize which you can supply with a policy that chains together the balancer with other bits and bobs if you're so inclined.

Some of the example code sets up custom pipelines:

https://github.com/OWASP/java-html-sanitizer/blob/fd6b2ddbf7ded2ba4fc1d718e450dc701368cdba/src/main/java/org/owasp/html/examples/EbayPolicyExample.java#L213-L231

jacobrshields commented 4 years ago

@mikesamuel I'm not sure I'm following completely.

You can configure your policy with your own HtmlStreamEventReceiver, and you can specify your own HtmlStreamEventProcessor when you call HtmlSanitizer#sanitize, but ultimately a TagBalancingHtmlStreamEventReceiver with a nesting limit of 256 is hard-coded to be wrapped around any preprocessor you pass in:

https://github.com/OWASP/java-html-sanitizer/blob/fd6b2ddbf7ded2ba4fc1d718e450dc701368cdba/src/main/java/org/owasp/html/HtmlSanitizer.java#L228-L253

Are you saying you could define an HtmlStreamEventReceiver or an HtmlStreamEventProcessor such that they hack around the hard-coded TagBalancingHtmlStreamEventReceiver? It's not clear to me how one would accomplish this.

log2akshat commented 2 years ago

In Zimbra we have also faced the same issue, so fixed up by ramping it's value: https://github.com/Zimbra/java-html-sanitizer-release-20190610.1/commit/bc58f5843b0f5d05d89e94ccd146f21aeb18df81