Open ftyers opened 4 years ago
or it could union with some other section after that section has been minimized to avoid having to create a new section in the binary.
@mr-martian that sounds a bit more complicated. Also, it would be cool to be able to give weights to sections, but I'll open another issue for that.
Upon poking around a bit, I've determined that this would not break the binary format, since section types are just encoded as strings and lt-proc
already handles multiple sections of the same type. Have lt-comp
relabel type="regex"
to type="standard"
would result in complete backwards compatibility, or lt-proc
can just recognize section names ending in @regex
and treat them like @standard
.
Either way, this should probably be accompanied by a way to mark <pardef>
s as non-minimizing for the same reason. regex="yes"
, perhaps.
This should be optional. For development it should be fast to compile and test, but for distribution it should heavily optimize to the smallest/fastest output binary.
Also, it occurs to me that this is tricky because lt-comp
minimizes each pardef separately in addition to each section.
But this is about speed – is minimising each pardef on its own slow? (Last time I checked, the section minimisation at the end was the slow step.)
Another alternative is that 0493630 added the ability to compile dictionaries in several pieces, which should alleviate the burden of frequently recompiling the regex sections.
In fact, we could have globally shared regex sections, as proposed in https://github.com/apertium/apertium/pull/161
minimisation has gotten quite a bit faster lately. but there's a related pr at https://github.com/apertium/lttoolbox/pull/165
At the moment we add regexes in sections. Minimising regexes takes a long time. So perhaps we could have a special
type="regex"
section that does not minimise, it would speed up compilation of regex-heavy dictionaries.This will likely break binary compatibility.