Closed unhammer closed 2 years ago
The -t
option is related, but is currently broken I think, see #8.
https://github.com/apertium/lttoolbox/commit/89c2a0600ba2a739b8ab8ed7120f9cf0d9a5301a can probably be simplified (-t code looked simpler, but doesn't seem to have support for word blanks), but seems to DTRT and runs in 0.7s on something that regular analysis uses 3.2s on while wake-up-mark-pgen takes 0.2s, seems acceptable. Still have to check @khannatanmai 's extensive pgen test suite
@unhammer I've made the relevant modifications to the tests in abc337d. It currently fails the first wblank test and I haven't made enough sense of the wblank logic to track down the issue.
Say for all words in your dictionary, you want to apply the rule
…inh t…
→…is…
. It's just noisy to have to add a<a/>
(or explicit~
in hfst/lexc) to the RL form-side of every place in your dictionary where that happens, and it's especially noisy if the parts of the forminh
are generated by different pardefs.
It occurs to me that this could also be fixed by composing
"postgen"
0:%~ <=> _ i n h .#. ;
with the generator (though making postgen able to handle this directly is probably still a good idea).
fix reverted in 957bc093afcb8def28fe583946ada3b8ac57f85d due to #123
Post-generation should be able to just run on everything LRLM and only apply the changes where it matches (as if it were a version of sed that respects deformatting).
Say for all words in your dictionary, you want to apply the rule
…inh t…
→…is…
. It's just noisy to have to add a<a/>
(or explicit~
in hfst/lexc) to the RL form-side of every place in your dictionary where that happens, and it's especially noisy if the parts of the forminh
are generated by different pardefs.If postgen didn't have to have a wake-up-mark, but stayed awake constantly, you could just put
<l>inh<b/>t<l> <r>is</r>
in post.dix and not have any changes to the generator at all.This might have to be a new option (
lt-proc -P, --post-generation-everywhere
or something).(via https://sourceforge.net/p/apertium/mailman/message/36600451/ )