elm / parser

A parsing library, focused on simplicity and great error messages
https://package.elm-lang.org/packages/elm/parser/latest
BSD 3-Clause "New" or "Revised" License
228 stars 46 forks source link

Inconsistent internal parser state #53

Open pithub opened 1 year ago

pithub commented 1 year ago

This issue describes a bug in Elm.Kernel.Parser.findSubString.


Note: the following issues describe symptoms of this bug:

In the same way, the following pull request tries to fix the symptoms:


The Elm Parser internally keeps track of the current position in two ways:

Normally both kinds of position infos (row and column vs. offset) are in sync with each other. (For a given source string, you can calculate both row and column from the offset and vice versa.)

The bug in Elm.Kernel.Parser.findSubString breaks this synchronicity, though. This affects the following parsers:

They set...

Here's an example with chompUntil:

import Parser exposing ((|.), (|=), Parser)

testParser : Parser { row : Int, col : Int, offset : Int }
testParser =
    Parser.succeed (\row col offset -> { row = row, col = col, offset = offset })
        |. Parser.chompUntil "token"
        |= Parser.getRow
        |= Parser.getCol
        |= Parser.getOffset

Parser.run testParser "< token >"
--> Ok { row = 1, col = 8, offset = 2 }

The state after the test parser is run:


The root cause for these bugs lies in the Elm.Kernel.Parser.findSubString function: https://github.com/elm/parser/blob/02839df10e462d8423c91917271f4b6f8d2f284d/src/Elm/Kernel/Parser.js#L120-L134

If the smallString is found, the returned newOffset is at the position before the smallString (the result of the indexOf function), but the new row and col after the smallString (at the target position).


Note: the following pull request tries to fix the comment of the Elm.Kernel.Parser.findSubString function to correctly describe the buggy behavior:

miniBill commented 5 months ago

I've now created an elm-review rule to check for this https://github.com/miniBill/elm-review-no-broken-elm-parser-functions