AndrasKovacs / flatparse

Fast parsing from bytestrings
MIT License
146 stars 12 forks source link

Case-insensitive keyword matching #10

Open cdsmith opened 3 years ago

cdsmith commented 3 years ago

I'm trying to migrate a megaparsec parser to flatparse, but it uses case-insensitive keywords. It would be really convenient to have case-insensitive versions of string, char, and switch. char isn't so hard to do. The other two look considerably harder, since the number of branches grows exponentially if using the same trie technique. Adding the API anyway would be really helpful even if the result is slower than the case-sensitive variant.

Depending on how my initial trials go with flatparse, I may have cycles to work on this, if you're interested in a pull request.

AndrasKovacs commented 3 years ago

The way this could be implemented, is to do string and switch Char-byChar, where we toLower every read character. The latin letter part should be special-cased though, the standard toLower goes on an extended trip in https://hackage.haskell.org/package/base-4.15.0.0/src/GHC/Unicode.hs. Of course string and switch would be slower in this way.

I'm interested in pull requests, I can also make you a collaborator if you expect to substantially contribute.