I recently wrote a small program with pom. I found the API interface lovely, but I found it very hard to get string values into the library. All sample code is written with <u8, T> parsers and the literals are written like b"char". It is clear how to use this with ASCII, but not unicode.
If I try to write my parsers instead as <char, T> then of course parse() cannot accept strings because then pom expects an array of chars and a string is UTF-8 bytes. I can convert the string to an array of chars, but for very long strings this will be inefficient.
I see the convert() function can be used to easily (efficiently?) interpret a string as a sequence of bytes, so maybe it is okay to just use <u8, T>. However, then I have a different problem. What if I want to have unicode literals (maybe sym('π'), if for some reason π is a separator) or unicode ranges (for example codepoint U+1100 to U+11FF [α..αΏ])?
Do I have to say seq("π".to_bytes()) every time? How then do I do character ranges?
Could pom be made to consume iterators instead of [T] arrays, so parse() could take string.chars() as an argument?
I recently wrote a small program with pom. I found the API interface lovely, but I found it very hard to get string values into the library. All sample code is written with <u8, T> parsers and the literals are written like
b"char"
. It is clear how to use this with ASCII, but not unicode.If I try to write my parsers instead as <char, T> then of course parse() cannot accept strings because then pom expects an array of chars and a string is UTF-8 bytes. I can convert the string to an array of chars, but for very long strings this will be inefficient.
I see the convert() function can be used to easily (efficiently?) interpret a string as a sequence of bytes, so maybe it is okay to just use <u8, T>. However, then I have a different problem. What if I want to have unicode literals (maybe
sym('π')
, if for some reason π is a separator) or unicode ranges (for example codepoint U+1100 to U+11FF [α..αΏ])?seq("π".to_bytes())
every time? How then do I do character ranges?