apertium / lttoolbox

Finite state compiler, processor and helper tools used by apertium
http://wiki.apertium.org/wiki/Lttoolbox
GNU General Public License v2.0
18 stars 22 forks source link

Merge Postgen, Intergen, and Transliteration #144

Closed mr-martian closed 2 years ago

mr-martian commented 2 years ago

Postgen, Intergen, and Transliteration all should do essentially the same thing.

Differences:

  1. Intergen and Transliteration do not respect wblanks
  2. Postgen and Intergen ignore anything not preceded by ~
  3. Postgen deletes ~ from the output
  4. Postgen rereads the suffix of a match (presumably to allow for overlapping matches)

Of these, 1. is a bug, 2. would be eliminated by #42, and 4. is implemented badly (see #123) and I'm not convinced it's actually useful. That just leaves 3., which seems like it should just be a flag modifying 1 function rather than a difference between 3.

In this PR I merge all 3 modes into 1, with flags for 3. and 4., though 4. is currently not working.

I currently get correct output if I run any of the tests manually, but if I try to run make test it hangs on one of the lines in PostgenerationWordboundBlankTest until the pipe times out. Any help that can be given in terms of identifying the cause of this would be welcome.

xavivars commented 2 years ago

Intergen can probably be completely deleted. I introduce it to as a hack to avoid multiplying even more bidixes and langpairs in spa-cat, but everything that was done with it is now replaced with lex-tools, CG and preferences. I also think it had problems with other translation modes (docx or odt, if I remember properly), so I'm not sure id leave it around if no-one uses it

unhammer commented 2 years ago

Seems like this fixes https://github.com/apertium/lttoolbox/issues/145 =D

That one hanging line is strange. I don't get the hang if I just run it from the command line.

unhammer commented 2 years ago

@mr-martian Using daemon/client from https://wiki.apertium.org/wiki/Daemon#Flushing_examples_in_bash I do

lt-comp lr tests/data/postgen.dix tests/data/postgen.bin
./daemon lttoolbox/lt-proc -p -z tests/data/postgen.bin 

in one terminal and then

$ echo 'hi ~les la' |./client
hi le pe test la
$ echo '[[t:b:0]]~le[[/]] la n' |./client

↑ that last one hangs. Looks like it's simply not flushing or not outputting the final NUL?

Looking at your diff, I don't see any references to NUL flushing in the added code; the old one used these foogeneration_wrapper_null_flush helpers, don't know if they'd be useful.

TinoDidriksen commented 2 years ago

Looking at your diff, I don't see any references to NUL flushing in the added code; the old one used these foogeneration_wrapper_null_flush helpers, don't know if they'd be useful.

I don't either, but just so it's clear: It should always null flush. Going forward, -z should be a no-op.

mr-martian commented 2 years ago

For the sake of backwards compatibility, I'm inclined to leave intergeneration in as a synonym for transliteration.

And yeah, I thought it might be null-flushing, but I couldn't figure out how to reproduce it - will try again in a bit.

mr-martian commented 2 years ago

Ok, so now it works for everything I can think of to pass in manually, both directly and via the daemon, but when I run the tests I'm still getting one hanging. Any ideas?

unhammer commented 2 years ago

It seems to work now?

mr-martian commented 2 years ago

For some reason when I run make test locally it fails, but no matter how exactly I try to reproduce it manually, it seems to work just fine.

mr-martian commented 2 years ago

Updated summary:

Merge postgeneration, intergeneration, and transliteration