gfngfn / SATySFi

A statically-typed, functional typesetting system
GNU Lesser General Public License v3.0
1.16k stars 82 forks source link

Proposal: change regexp backend #306

Open leque opened 2 years ago

leque commented 2 years ago

Currently, SATySFi uses OCaml's Str module as the regexp backend. But it has some drawbacks:

So, how about changing the regexp backend from Str to, say, ocaml-re? It supports large subsets of PCRE, has nice performance, and has no global states. Or, pcre-ocaml? It might be another choice not to have regexp primitives in the core, and use a library-implemented regexp engine, such as satysfi-base's.

puripuri2100 commented 2 years ago

The regexp engine implemented by the SATySFi library has the disadvantage of "Slow".

leque commented 2 years ago

After some investigation, I realized that ocaml-re does not support Unicode (https://github.com/ocaml/ocaml-re/issues/24). So pcre-ocaml + `UTF8 flag would be a preferred choice. Unicode support is also required to fix a problem that regexp primitives (string-scan (regexp `.`) `あいう` etc.) may generate a invalid string (cf. https://github.com/gfngfn/SATySFi/pull/56).