Open unhammer opened 7 months ago
See if lsx-comp --debug
does what you want.
$ lsx-comp --debug apertium-nno-nob.nob-nno.lsx foo.bin
lsx-comp v3.7.6: build a letter transducer from a dictionary
USAGE: lsx-comp [-dmHSjVh] [-v VAR] [-a ALT] [-l VAR] [-r VAR] lr | rl | u dictionary_file output_file [acx_file]
-d, --debug: insert line numbers before each entry
-m, --keep-boundaries: keep morpheme boundaries
-v, --var: set language variant
-a, --alt: set alternative (monodix)
-l, --var-left: set left language variant (bidix)
-r, --var-right: set right language variant (bidix)
-H, --hfst: expect HFST symbols
-S, --no-split: don't attempt to split into word and punctuation sections
-j, --jobs: use one cpu core per section when minimising, new section after 50k entries
-V, --verbose: compile verbosely
-h, --help: print this message and exit
$ ls foo.bin
ls: cannot access 'foo.bin': No such file or directory
$ lsx-comp -d apertium-nno-nob.nob-nno.lsx foo.bin
lsx-comp v3.7.6: build a letter transducer from a dictionary
USAGE: lsx-comp [-dmHSjVh] [-v VAR] [-a ALT] [-l VAR] [-r VAR] lr | rl | u dictionary_file output_file [acx_file]
-d, --debug: insert line numbers before each entry
-m, --keep-boundaries: keep morpheme boundaries
-v, --var: set language variant
-a, --alt: set alternative (monodix)
-l, --var-left: set left language variant (bidix)
-r, --var-right: set right language variant (bidix)
-H, --hfst: expect HFST symbols
-S, --no-split: don't attempt to split into word and punctuation sections
-j, --jobs: use one cpu core per section when minimising, new section after 50k entries
-V, --verbose: compile verbosely
-h, --help: print this message and exit
lsx-comp -d lr apertium-nno-nob.nob-nno.lsx foo.bin
(direction still required)
Doh, sorry. I blame lack of sleep.
But that's pretty great :-) It would be nice if it also inserted an end-marker, but in the meanwhile I can just use a simplified xslt
We could compile a (larger) fst that outputs line numbers in the stream for every top-level rule application.
nob-nno has a PoC using xsltproc: https://github.com/apertium/apertium-nno-nob/commit/6d2a7e05c18ce390eb98c0719c66a7286a6e8c5c but would be nicer to have an lsx-comp --trace (also, nob-nno uses awk for line numbers, it will break if someone writes
<e
in the wrong place)