dbuenzli / uutf

Non-blocking streaming Unicode codec for OCaml
http://erratique.ch/software/uutf
ISC License
30 stars 15 forks source link

Add ability to peek at the next decoder character #18

Closed brandonf2002 closed 7 months ago

brandonf2002 commented 7 months ago

Hi, I'm currently writing a parser/lexer for a language that makes use of UTF-8 characters as part of its grammar and thought it would be useful to be able to "peek" at the next character in the decoder stream without advancing the decoder position forward.

There are plenty of work arounds for this situation but I thought it might be a useful feature to have built in.

I am proposing a function that would look like:

val decoder_peek : decoder ->
  [ `Await | `Uchar of Uchar.t | `End | `Malformed of string]

I would be more than happy to write this PR myself, I am just making this issue to see if a patch for this would be accepted.

dbuenzli commented 7 months ago

Thanks for making an issue first. The answer is no. For the following reasons:

  1. It's not too hard to integrate that lookahead in your lexer or parser state. See here for an example.

  2. In general uutf is in maintenance mode, I don't plan to add any new feature and I encourage people to directly use the stdlib decoders that have been integrated in OCaml 4.14.0 (https://github.com/ocaml/ocaml/pull/10710). They are also likely more performant (at least they allocate less on decodes).