AndrasKovacs / flatparse

Fast parsing from bytestrings
MIT License
146 stars 12 forks source link

Alternative to runParserST that stays in the ST monad #44

Closed 0rphee closed 1 year ago

0rphee commented 1 year ago

The already existing function to run parsers in the ST monad has this definition (using the Stateful version):

-- | Run an `ST`-based parser. The `Int` argument is the initial state.
runParserST :: (forall s. ParserST s r e a) -> r -> Int -> B.ByteString -> Result e a
runParserST pst !r i buf = unsafeDupablePerformIO (runParserIO pst r i buf)
{-# inlinable runParserST #-}

This is fine for some ST computations, however, if for whatever reason I want to use this parser inside a larger ST computation, or with the use of a STRef (instead of using an MVar for example), it is impossible to do so.

This came up as a problem I found when trying to collect multiple parsing errors. Initially I just exited the parsing with an error, though obviously this only collected the first error and stopped there. Then I used MVars succesfully, and tried to go with STRef's.

Using the Stateful version, with an STRef s ScanErr as the reader environment, I found that the already existing runParserST function didn't work for this usecase:

-- the parser to be run
scanTokens :: ParserT (STMode s) (STRef s ScanErr) e (Vector Token) 
    • Couldn't match type ‘s1’ with ‘s’
      Expected: ParserST s1 (STRef s ScanErr) e (Vector Token)
        Actual: ParserT (STMode s1) (STRef s1 ScanErr) e (Vector Token)
      ‘s1’ is a rigid type variable bound by
        a type expected by the context:
          forall s1. ParserST s1 (STRef s ScanErr) e (Vector Token)
        at app/Scanner.hs:66:25-34
      ‘s’ is a rigid type variable bound by
        the type signature for:
          scanFile :: forall s.
                      ByteString
                      -> ST
                           s
                           (Either CodeError (Vector ScannerError, Vector Token, ByteString))
        at app/Scanner.hs:62:1-97
    • In the first argument of ‘runParserST’, namely ‘scanTokens’
      In the expression: runParserST scanTokens stRef 0 bs
      In an equation for ‘res’: res = runParserST scanTokens stRef 0 bs
    • Relevant bindings include
        stRef :: STRef s ScanErr (bound at app/Scanner.hs:64:3)
        scanFile :: ByteString
                    -> ST
                         s
                         (Either CodeError (Vector ScannerError, Vector Token, ByteString))
          (bound at app/Scanner.hs:63:1)
   |
66 |   let res = runParserST scanTokens stRef 0 bs
   |                         ^^^^^^^^^^

Very possibly I'm just contorting myself to do this with ST instead of IO. However, if it is possible to do this way anyways, why not? haha!

So I hacked together this alternative function, following from the implementation of the original runParserST:

unsafeRunParserST :: ParserST s (STRef s c) e a -> STRef s c -> Int -> ByteString -> ST s (Result e a )
unsafeRunParserST pst !r i buf = unsafeIOToST (runParserIO (unsafeCoerce pst) r i buf)

I think it would be very nice to make something like this available to any flatparse user. This version has worked fine for my usecase, though I don't know much intricacies about how ST/IO work to guarantee that this function will be safe, and if it were, I don´t know how to generalize this, since not everyone might want exactly an environment of type STRef s c

Anyways, I thought sharing this with someone more knowledgeable than me would be worthwhile.

Thanks for the great library!

AndrasKovacs commented 1 year ago

Indeed, runParserST should return in ST s (Result e a) if we want to be consistent with runParserIO. We can always just runST the result. I think this actually merits a mild breaking change here. I'll push this on hackage shortly.

AndrasKovacs commented 1 year ago

In 1559b56a0feafb93661675fb406bc66894115fda. Pushed to hackage.