AngleSharp / AngleSharp.Css

:angel: Library to enable support for cascading stylesheets in AngleSharp.
https://anglesharp.github.io
MIT License
72 stars 34 forks source link

GetInnerText() '\u3000' convert to ' ' #78

Closed wuyu8512 closed 2 years ago

wuyu8512 commented 3 years ago

Bug Report

Prerequisites

For more information, see the CONTRIBUTING guide.

Description

[Description of the bug]

Steps to Reproduce

  1. [Run Code]
    
    var Config = Configuration.Default.WithCss();
    var  Context = BrowsingContext.New(Config);
    var HtmlParser = Context.GetService<IHtmlParser>();

var doc = HtmlParser.ParseDocument("

第三章\u3000夢與超能力

"); Console.WriteLine(doc.DocumentElement.GetInnerText()); Console.WriteLine(doc.DocumentElement.GetInnerText() == "第三章\u3000夢與超能力");



**Expected behavior:** [What you expected to happen]
`True`
**Actual behavior:** [What actually happened]
`False`
**Environment details:** [OS, .NET Runtime, ...]
Win10 x64 .Net 5
FlorianRappl commented 3 years ago

Hm I may be missing something here. The \u3000 is already handled by C# - there is nothing AngleSharp can do here. It should output  . So you enter "<h4>第三章 夢與超能力</h4>" to AngleSharp. Is this special character now removed? What does the inner text actually look like?

wuyu8512 commented 3 years ago

What does the inner text actually look like?

第三章\u0020夢與超能力

\u3000 became \u0020