fsprojects / FSharp.Data

F# Data: Library for Data Access
https://fsprojects.github.io/FSharp.Data
Other
816 stars 288 forks source link

Infinite loop on invalid HTML #1394

Closed njlr closed 3 years ago

njlr commented 3 years ago
#r "nuget: FSharp.Data, 4.2.3"

open FSharp.Data

let content = """<html>
</html
"""

let doc = HtmlDocument.Parse(content) // Never terminates

printfn "%A" doc

(Different one to previous)

$ dotnet --version
5.0.103
albert-du commented 3 years ago

Seems to get stuck here: https://github.com/fsprojects/FSharp.Data/blob/9dff16cb053709b77a7a124bd59b0b11a17358b9/src/Html/HtmlParser.fs#L680-L687

Adding a case for end of file seems to fix it

and attributeName state =
    match state.Peek() with
    | '=' -> state.Pop(); beforeAttributeValue state
    | '/' -> state.Pop(); selfClosingStartTag state
    | '>' -> state.Pop(); state.EmitTag(false)
    | TextParser.LetterDigit _ -> state.ConsAttrName(); attributeName state
    | TextParser.Whitespace _ -> afterAttributeName state
    | TextParser.EndOfFile _ -> state.EmitTag(true)
    | _ -> state.ConsAttrName(); attributeName state
let content = """<html>
</html
"""
printfn "%A" (HtmlDocument.Parse content) //prints "<html />"
cartermp commented 3 years ago

@albert-du would be happy to accept a contribution!

joshuapassos commented 3 years ago

@cartermp can you close this issue?

cartermp commented 3 years ago

Whoops, looks like I forgot to do that, whee