fsprojects / FSharp.Data

F# Data: Library for Data Access
https://fsprojects.github.io/FSharp.Data
Other
815 stars 288 forks source link

Infinite loop in HtmlDocument.Parse #1264

Closed njlr closed 3 years ago

njlr commented 5 years ago

This program will never finish, and takes increasing amounts of memory:

open System
open FSharp.Data

let content =
  """
  Steve Jobs
  steve@apple.com

  Education:
    - Master of Mathematics Honours Computer Science and Combinatorics &
      Optimization. I
      specialized in systems and real-time programming, programming language
      implementation, and mathematical optimization.

  Skills:
    - Proficient in Rust, C++, Scheme, x86(_64) LaTeX,
      (Postgre)SQL, Gurobi, AWS, Google Cloud Platform, .NET (Core), C#,
      Python, low-level profiling and optimization on Linux and Windows.

    - Can do things with Java, Haskell, Clojure,
      Scala, AMPS, redis, OpenGL.

    Instructional support assistant at the School,
    September to January 2010.
      - Started the Java project[3], a custom IDE for students in an
        introductory computer science course.

  """

[<EntryPoint>]
let main argv =
  printfn "%s" "Parsing HTML... "

  let html = HtmlDocument.Parse content

  printfn "%s" "done. "

  0

I would expect it to return an "invalid HTML" error.

adz commented 5 years ago

Thank you so much for this report.

I've been searching for the cause of spiking memory issues (until process is killed) on a server process, which turns out to be completely gone without using the FsData Html parser. It is part of a larger app, and I didn't suspect this code.

We have determined that the issue occurs when there is an "&" without a corresponding ";" or "<" character.

dsyme commented 5 years ago

Would be great to get this fixed, anyone want to take a stab at it?

colinbull commented 5 years ago

If this is surrounded in a CData tag does it complete?

tomakita commented 5 years ago

Hi @colinbull, are you still working on this? I think I have a fix (I didn't realize someone had already assigned this issue), but won't submit a PR if you're on the case already.

colinbull commented 5 years ago

I have had a look and have a fix but yet to test. Sorry ran out of time. Feel free to submit a PR and I can review

Many thanks.

On Fri, 4 Oct 2019 at 00:29, Tom Akita notifications@github.com wrote:

Hi @colinbull https://github.com/colinbull, are you still working on this? I think I have a fix, but won't submit a PR if you're on the case already.

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/fsharp/FSharp.Data/issues/1264?email_source=notifications&email_token=AAEO6SSEPB5ZGWJ57A6OCNDQMZ55XA5CNFSM4HL26Z62YY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOEAJ4SPA#issuecomment-538167612, or mute the thread https://github.com/notifications/unsubscribe-auth/AAEO6SWZMXHAX5QN7FFFDWTQMZ55XANCNFSM4HL26Z6Q .

tomakita commented 5 years ago

Ok, sounds good. I'll submit my PR after I've cleaned some things up. If nothing else, maybe we can compare our approaches -- mine might not be a good way to do it.

colinbull commented 5 years ago

Yeah.. no worries. Just mention me in the PR and I’ll look

On Fri, 4 Oct 2019 at 00:32, Tom Akita notifications@github.com wrote:

Ok, sounds good. I'll submit my PR after I've cleaned some things up. If nothing else, maybe we can compare our approaches -- mine might not be a good way to do it.

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/fsharp/FSharp.Data/issues/1264?email_source=notifications&email_token=AAEO6SXAXPDOV7PVKC6MAG3QMZ6JNA5CNFSM4HL26Z62YY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOEAJ4XWI#issuecomment-538168281, or mute the thread https://github.com/notifications/unsubscribe-auth/AAEO6SSTJPIR4XQ5G2KAL5LQMZ6JNANCNFSM4HL26Z6Q .

cartermp commented 3 years ago

https://github.com/fsprojects/FSharp.Data/pull/1393