fsprojects / FSharp.Data

F# Data: Library for Data Access
https://fsprojects.github.io/FSharp.Data
Other
815 stars 288 forks source link

Html Parser: `Invalid css selector syntax` exception thrown for css pseudo-classes #1383

Open EverybodyKurts opened 3 years ago

EverybodyKurts commented 3 years ago

I'm trying to select the first paragraph in a group of sibling paragraphs. According to the MDN nth-child() entry, I should be able to do this using :nth-child().

However, this results in a syntax error when using HtmlDocument.CssSelect:

> node.CssSelect("p:nth-child(1)");;
System.Exception: Invalid css selector syntax (char ':' at offset 0)
   at FSharp.Data.HtmlCssSelectors.tokenize'@104(CssSelectorTokenizer this, FSharpList`1 acc, FSharpList`1 sourceChars)
   at FSharp.Data.HtmlCssSelectors.CssSelectorTokenizer.tokenize()
   at FSharp.Data.HtmlCssSelectors.CssSelectorTokenizer.Tokenize(String pCssSelector)
   at FSharp.Data.HtmlNodeModule.Select(IEnumerable`1 nodes, String selector)
   at <StartupCode$FSI_0029>.$FSI_0029.main@()
EverybodyKurts commented 3 years ago

I see that in http://fsprojects.github.io/FSharp.Data/library/HtmlCssSelectors.html#Implemented-and-missing-features it's listed as a todo. Perhaps instead of a general System.Exception being thrown, a NotImplementedException is thrown?

EverybodyKurts commented 3 years ago

For future reference (notably me), the tokenizing process happens in this function:

https://github.com/fsprojects/FSharp.Data/blob/main/src/Html/HtmlCssSelectors.fs#L103