Closed GoogleCodeExporter closed 8 years ago
This issue was closed by revision r281.
Original comment by sjdir...@gmail.com
on 8 Mar 2013 at 8:00
Patched html agility to fix this issue. Added
HtmlDocument.OptionMaxNestedChildNodes that can be set to prevent
StackOverflowExceptions that are caused by tons of nested tags. It will throw
an ApplicationException with message "Document has more than X nested tags.
This is likely due to the page not closing tags properly."
Usage...
HtmlDocument hapDoc = new HtmlDocument();
hapDoc.OptionMaxNestedChildNodes = 5000;
try
{
hapDoc.LoadHtml(RawContent);
}
catch (Exception e)
{
hapDoc.LoadHtml("");
}
Attached new HtmlAgilityPack.dll assembly. Will submit this patch to the
HtmlAgilityPack project site.
Original comment by sjdir...@gmail.com
on 8 Mar 2013 at 8:07
Attachments:
Added all source and binary to the hap project site...
http://www.codeplex.com/site/users/view/sjdirect
Original comment by sjdir...@gmail.com
on 8 Mar 2013 at 8:52
Attached full patch zip submitted to hap project
Original comment by sjdir...@gmail.com
on 8 Mar 2013 at 9:29
Attachments:
[deleted comment]
Hello, Although I am using - as recommended - your HtmlAgilityPack.dll , I am
still getting the StackOverFlow exception, Please check the screenshot.
Hope you can help in this.
Thanks in advance.
Original comment by fastoka...@gmail.com
on 3 Sep 2013 at 11:00
Attachments:
Hi, can you narrow it down to a single page/url? HAP uses many stacks in its
implementation. I only fixed one related to nested html tags, it is likely that
there are other conditions that can cause stackoverflows.
Original comment by sjdir...@gmail.com
on 3 Sep 2013 at 3:48
Hello,
I was applying the crawler to the following site :
http://www.gesetze-im-internet.de/aktuell.html , getting the xmls within it,
its over 200 000 pages with nested html Tags.
Somehow I think its related with VisualStudio Stack, I will test this today,
just wanted to let you know :)
Original comment by fastoka...@gmail.com
on 4 Sep 2013 at 6:43
Turns out that using HtmlDocument.OptionFixNestedTags = true solves this issue
without needing the patched version..
Original comment by sjdir...@gmail.com
on 10 Jul 2015 at 6:13
Original issue reported on code.google.com by
sjdir...@gmail.com
on 8 Mar 2013 at 7:47Attachments: