Open GeneThomas opened 4 years ago
I debugged it and what slows it down is basically the computation of the style rules and because i also dont need styles for InnerText, except the default rules like paragraph or div break lines and stuff, i added 2 null checks.
In that case i can use InnerText without specifying .WithCss and without calling WithRenderDevice, this makes your code parse in 25 ms, instead of 8 minutes.
I will use my fork for now because this is probably not a acceptable solution for Florian
Bug Report
I am writing, what I would think is a fairly simple usage of AngleSharp[.Css], I am extracting a html table of covid-19 cases etc.. by country. The headers [or other cells] can contain html <br>. INode.Text() [an extension] and INode.TextContent() remove the <br> returning values like “TotalCases”. My implementation parses the 3000ish cells in 4.6 seconds. Using AngleSharp.Css’s ElementExtensions’s string GetInnerText(this IElement element); takes over 8 minutes makeing it unusable.
I assume you must implement Css’s display:none and visibility:hidden. I do not require that functionality, as I do not require an implementation of Javascript. If GetInnerText() can not be sped up a reasonable solution would be to use something like my code with your implementation of html entities such as © etc..
The attached project’s interesting code is in AngleSharpCssSpeedFault.cs. AngleSharpCssSpeedFault.zip
The last method InnerText(IElement) has a #if to switch between the two implementations of InnerText().
Prerequisites
Run the attached solution.
Description
see above
Steps to Reproduce
Possible Solution
Use my InnerText() but add the expanding of all html & entities as that is missing.