dhowe / rita

Website, documentation and examples for RiTa
https://rednoise.org/rita
71 stars 9 forks source link

Can we generate verb-lists from lexicon #143

Closed dhowe closed 3 years ago

dhowe commented 3 years ago

Can we generate verb-lists from lexicon, specifically VB_ENDS_IN_E and VB_ENDS_IN_DOUBLE ??

dhowe commented 3 years ago

also, IRREG_VERBS_NOLEX is missing in java

Real-John-Cheung commented 3 years ago

Can we generate verb-lists from lexicon, specifically VB_ENDS_IN_E and VB_ENDS_IN_DOUBLE ??

Yes we can, in fact now the lists are generated from the lexicon and then hard-coded into the file (may be we should do this generation when initializing conjugator? but it might slow down the process) this is also related to

IRREG_VERBS_NOLEX is missing in java

coz the reason that it is missing is that if we also hard-coded it in, the static initializer of Conjugator will exceed the 65535 bytes limit (another solution is we keep the lists in another file?)

Real-John-Cheung commented 3 years ago

I implant the method in the above PRs It did slow down the initialisation but I think not a lot

dhowe commented 3 years ago

I've done some refactors to unconjugate and its tests:

  1. lazy-load verb lists only when calling unconjugate()
  2. remove unneeded regular expressions (these are slower than simple string methods)
  3. added tests for a few missing cases

TODO: