elm / parser

A parsing library, focused on simplicity and great error messages
https://package.elm-lang.org/packages/elm/parser/latest
BSD 3-Clause "New" or "Revised" License
228 stars 46 forks source link

Fix for the inconsistent internal parser state bug #54

Open pithub opened 1 year ago

pithub commented 1 year ago

This code change fixes the following issues:

and makes the following pull requests superfluous:


As described in issue #53, there's a bug in the Elm.Kernel.Parser.findSubString function, that leads to inconsistent internal parser positions where:

This code change fixes that bug, so that both offset and row/column are consistently positioned after the token.


There are two reasons why I chose the after token position and not the before token position:

import Parser exposing ((|.), (|=), Parser)

testParser : Parser { row : Int, col : Int, offset : Int }
testParser =
    Parser.succeed (\row col offset -> { row = row, col = col, offset = offset })
        |. Parser.multiComment "{-" "-}" Parser.Nestable
        |= Parser.getRow
        |= Parser.getCol
        |= Parser.getOffset

Parser.run testParser "{- -}"
--> Ok { row = 1, col = 6, offset = 5 }

The real bug fix is on this line:

src/Elm/Kernel/Parser.js
@@ -133 +133 @@ var _Parser_findSubString = F5(function(smallString, offset, row, col, bigString
-       return __Utils_Tuple3(newOffset, row, col);
+       return __Utils_Tuple3(index < 0 ? -1 : target, row, col);

where we either return -1 to signal the "subString not found" case, or the offset after the subString, which is stored in variable target.

So what about all the other changes?


First, I decided to rename the newOffset variable:

src/Elm/Kernel/Parser.js
@@ -122,2 +122,2 @@ var _Parser_findSubString = F5(function(smallString, offset, row, col, bigString
-       var newOffset = bigString.indexOf(smallString, offset);
-       var target = newOffset < 0 ? bigString.length : newOffset + smallString.length;
+       var index = bigString.indexOf(smallString, offset);
+       var target = index < 0 ? bigString.length : index + smallString.length;

It doesn't contain the new offset anymore (since we return either -1 or target), and I thought the name would be misleading now. Therefore I changed the name to "index", because it contains the result of the indexOf function.


In the Parser.Advanced module, there's a wrapper function for every Kernel function. The comment of the findSubString wrapper function has been wrong before this code change (see #37 "Fix comment in findSubString"), and it was still wrong after the code change, so I changed it to document the fact, that we return the position after the subString:

src/Parser/Advanced.elm
@@ -1125,7 +1125,7 @@ isAsciiCode =
     findSubString "42" offset row col "Is 42 the answer?"
         --==> (newOffset, newRow, newCol)

-If `offset = 0` we would get `(3, 1, 4)`
+If `offset = 0` we would get `(5, 1, 6)`
 If `offset = 7` we would get `(-1, 1, 18)`
 -}
 findSubString : String -> Int -> Int -> Int -> String -> (Int, Int, Int)

The wrapper functions hide the fact, that they are implemented in JavaScript rather than in Elm, from the rest of the module code. The rest of the code only uses the wrapper functions, just as if they had been implemented in Elm.

The only exception from this rule was a line where the findSubString Kernel function has been called directly, and I thought it was appropriate to change this line, too, to be consistent with the rest of the code, when we are modifying the findSubString function itself:

src/Parser/Advanced.elm
@@ -913 +913 @@ chompUntilEndOr str =
-        Elm.Kernel.Parser.findSubString str s.offset s.row s.col s.src
+        findSubString str s.offset s.row s.col s.src
rupertlssmith commented 1 year ago

If this patch fixes:

https://github.com/elm/parser/issues/20

Does that mean it should be preferred to the PR that fixes just that issue:

https://github.com/elm/parser/pull/21

This PR seems more general than that one and fixes a number of issues together.

pithub commented 1 year ago

Hi Rupert, I think you addressed the question to me, but as the author I'm biased, of course. If I wouldn't prefer this PR to #21, then I wouldn't have added it.