apertium / apertium-separable

Module for reordering separable/discontiguous multiwords.
https://wiki.apertium.org/wiki/Apertium_separable
GNU General Public License v3.0
4 stars 5 forks source link

lsx-comp --trace option that inserts (top-level rule) line numbers in output #54

Open unhammer opened 7 months ago

unhammer commented 7 months ago

We could compile a (larger) fst that outputs line numbers in the stream for every top-level rule application.

nob-nno has a PoC using xsltproc: https://github.com/apertium/apertium-nno-nob/commit/6d2a7e05c18ce390eb98c0719c66a7286a6e8c5c but would be nicer to have an lsx-comp --trace (also, nob-nno uses awk for line numbers, it will break if someone writes <e in the wrong place)

mr-martian commented 7 months ago

See if lsx-comp --debug does what you want.

unhammer commented 7 months ago
$ lsx-comp --debug apertium-nno-nob.nob-nno.lsx foo.bin
lsx-comp v3.7.6: build a letter transducer from a dictionary
USAGE: lsx-comp [-dmHSjVh] [-v VAR] [-a ALT] [-l VAR] [-r VAR] lr | rl | u dictionary_file output_file [acx_file]
  -d, --debug:               insert line numbers before each entry
  -m, --keep-boundaries:     keep morpheme boundaries
  -v, --var:                 set language variant
  -a, --alt:                 set alternative (monodix)
  -l, --var-left:            set left language variant (bidix)
  -r, --var-right:           set right language variant (bidix)
  -H, --hfst:                expect HFST symbols
  -S, --no-split:            don't attempt to split into word and punctuation sections
  -j, --jobs:                use one cpu core per section when minimising, new section after 50k entries
  -V, --verbose:             compile verbosely
  -h, --help:                print this message and exit
$ ls foo.bin
ls: cannot access 'foo.bin': No such file or directory
$ lsx-comp -d apertium-nno-nob.nob-nno.lsx foo.bin
lsx-comp v3.7.6: build a letter transducer from a dictionary
USAGE: lsx-comp [-dmHSjVh] [-v VAR] [-a ALT] [-l VAR] [-r VAR] lr | rl | u dictionary_file output_file [acx_file]
  -d, --debug:               insert line numbers before each entry
  -m, --keep-boundaries:     keep morpheme boundaries
  -v, --var:                 set language variant
  -a, --alt:                 set alternative (monodix)
  -l, --var-left:            set left language variant (bidix)
  -r, --var-right:           set right language variant (bidix)
  -H, --hfst:                expect HFST symbols
  -S, --no-split:            don't attempt to split into word and punctuation sections
  -j, --jobs:                use one cpu core per section when minimising, new section after 50k entries
  -V, --verbose:             compile verbosely
  -h, --help:                print this message and exit
mr-martian commented 7 months ago

lsx-comp -d lr apertium-nno-nob.nob-nno.lsx foo.bin (direction still required)

unhammer commented 7 months ago

Doh, sorry. I blame lack of sleep.

But that's pretty great :-) It would be nice if it also inserted an end-marker, but in the meanwhile I can just use a simplified xslt