avh4 / elm-format

elm-format formats Elm source code according to a standard set of rules based on the official Elm Style Guide
BSD 3-Clause "New" or "Revised" License
1.31k stars 148 forks source link

Merge new-parser branch #754

Closed avh4 closed 2 years ago

avh4 commented 2 years ago

TODO:

avh4 commented 2 years ago

Hmm, I am sometimes getting a Prelude.read: no parse crash, but it is only partially reproducible. It seems to only happen after a large number of files have been processed, only in file overwrite mode, and only after 30-40 of them have had formatting changes written. It also seems to not happen if stdout is piped to another process.

I'm wondering if maybe the new parsing code we brought in from elm-compiler has some issue where it doesn't lock some of the memory that some of the parsed pointers refer to? And that's causing problems when lots of files are opened?

avh4 commented 2 years ago

Could it possibly be because of lazy reading and that the file starts to get overwritten before it's done parsing?

avh4 commented 2 years ago

So far, haven't been able to reproduce with --enable-profiling. But have reproduced both with -O0 and with -O2.

avh4 commented 2 years ago

Could it possibly be because of lazy reading and that the file starts to get overwritten before it's done parsing?

Seems unlikely because we read files with Relude.readFileBS, which is Data.ByteString.readFile which reads strictly.

avh4 commented 2 years ago

Confirmed the issues is from Parse.Number.parseFloat, and it is trying to parse an empty string.

debugging patch:

diff --git a/elm-format-lib/src/Parse/Number.hs b/elm-format-lib/src/Parse/Number.hs
index fbf881d1..7ee922dd 100644
--- a/elm-format-lib/src/Parse/Number.hs
+++ b/elm-format-lib/src/Parse/Number.hs
@@ -96,7 +96,13 @@ number toExpectation toError =

 parseFloat :: Ptr Word8 -> Ptr Word8 -> Double
 parseFloat pos end =
-  read $ Utf8.toChars $ Utf8.fromPtr pos end
+  readDebug $ Utf8.toChars $ Utf8.fromPtr pos end
+
+readDebug :: (Read a, Show a) => String -> a
+readDebug s = case [x | (x,t) <- reads s, ("","")  <- lex t] of
+  [x] -> x
+  []  -> error $ "readDebug: no parse: " ++ s
+  many   -> error $ "readDebug: ambiguous parse: " ++ show many
avh4 commented 2 years ago

I think this might be the issue (and fix):

diff --git a/elm-format-lib/src/Parse/Number.hs b/elm-format-lib/src/Parse/Number.hs
index fbf881d1..68d97bef 100644
--- a/elm-format-lib/src/Parse/Number.hs
+++ b/elm-format-lib/src/Parse/Number.hs
@@ -88,7 +88,8 @@ number toExpectation toError =
             OkFloat newPos representation ->
               let
                 !newCol = col + fromIntegral (minusPtr newPos pos)
-                !float = Float (parseFloat pos newPos) representation
+                !value = parseFloat pos newPos
+                !float = Float value representation
                 !newState = P.State src newPos end indent row newCol nl
               in
               cok float newState

!float I believe will not force evaluation of the arguments to Float, so the call to parseFloat might not have been evaluated until formatting, which could possibly be after the Ptrs are no longer valid. This differs from elm/compiler which doesn't actually parse the double, but did force the evaluation of the argument to Float.

Unfortunately it's hard to validate if this fixes, since the issue was only ~40% reproducible.