brandonchinn178 / toml-reader

Haskell library for parsing v1.0 TOML files
https://hackage.haskell.org/package/toml-reader
BSD 3-Clause "New" or "Revised" License
13 stars 6 forks source link

Potential DOS with exponent notation? #8

Closed david-christiansen closed 2 years ago

david-christiansen commented 2 years ago

Thanks for writing a library to parse TOML 1.0!

The TOML spec allows exponents to be used in integers. When I try to decode a TOML.Value from this:

[thing]
oops = 1e1000000000000000000000000000000000000000000000000000000000000000

the parser takes a very long time. I presume that it's allocating a quite large integer, or doing some multiplications, but I haven't dug into the source code to see.

I think that Aeson uses an explicit scientific number representation for integers, which preserves exponent notation as-written, and gives client libraries a way to check whether it's in range. Is this something worth doing here?

brandonchinn178 commented 2 years ago

Thanks for the issue! Couple things here: when you use scientific notation, it's a float, not an integer. Unlike JSON, TOML distinguishes between integers and floats. Also, TOML's spec explicitly says that floats should be represented as a IEEE double precision float, which makes me hesitant to use an arbitrary precision representation. (The Integer spec does specify "the implementation should support 64-bit signed integers", which I take to mean "at least", while the Float spec specifies it "should be implemented as a double precision float" which explicitly specifies the implementation).

The design space is also a bit different here, because generally, TOML isnt used with untrusted input.

Also, does the parser itself take a long time? With Haskell's laziness, I would expect the parser to finish quickly and only take a long time when you explicitly inspect the value.

Related: https://github.com/toml-lang/toml/issues/538

brandonchinn178 commented 2 years ago

Update: just tested it; the following code runs and exits immediately:

case parseTOML "" "a = 1e1000000000000000000000000000000000000000000000000000000000000000" of
  Right _ -> return ()
  Left e -> error $ show e

so the issue is converting the thing into a Double.

One sane thing I could do right now is check if the exponent is greater than some arbitrary value like 1000 (since Double's max value is around 10^308) and just return infinity immediately.

david-christiansen commented 2 years ago

Sorry for the slow responses here!

I had assumed that 1e100 was an integer, and 1.0e100 a float. I suppose I read it wrong :-)

Thanks again!