How to use with Unicode?

I recently wrote a small program with pom. I found the API interface lovely, but I found it very hard to get string values into the library. All sample code is written with <u8, T> parsers and the literals are written like b"char". It is clear how to use this with ASCII, but not unicode.

If I try to write my parsers instead as <char, T> then of course parse() cannot accept strings because then pom expects an array of chars and a string is UTF-8 bytes. I can convert the string to an array of chars, but for very long strings this will be inefficient.

I see the convert() function can be used to easily (efficiently?) interpret a string as a sequence of bytes, so maybe it is okay to just use <u8, T>. However, then I have a different problem. What if I want to have unicode literals (maybe sym('🐈'), if for some reason 🐈 is a separator) or unicode ranges (for example codepoint U+1100 to U+11FF [ᄀ..ᇿ])?

Do I have to say seq("🐈".to_bytes()) every time? How then do I do character ranges?
Could pom be made to consume iterators instead of [T] arrays, so parse() could take string.chars() as an argument?

J-F-Liu / pom

How to use with Unicode? #53