Closed AMR-KELEG closed 5 years ago
Should negative state ids be allowed?
The type at least is signed https://github.com/apertium/lttoolbox/blob/master/lttoolbox/transducer.h#L46 – I don't know if there is any code that depends on it being positive. OTOH, I've never seen negative state ids. How does it fail on negative state ids? (If it crashes or hangs, that's better than giving a wrong result …)
Why does it say "final@inconditional" when it compiles inputs that have no punctuation characters?
$ lttoolbox/lt-comp lr /tmp/simple.att /tmp/simple2.bin
Warning: Multiple fsts in '/tmp/simple.att' will be disjuncted.
main@standard 5 4
final@inconditional 3 2
Also, it doesn't print the disjuncted ones longer:
$ lt-print /tmp/simple2.bin
Error: empty set of final states
(which is weird, it does analyse the inputs – did it add an extra transducer without final states?) Fortunately printing works for the plain .att's.
On the plus side, there's no noticable speed difference even with two passes over the file (~1.85s vs ~1.75s on a 13M .att file).
The patch just adds a new initial state. I commented the line that terminates the lt-print and here is the full output:
$ cat sample.att
0 1 i i
1 2 s s
2 3 n n
3 4 ' '
4 5 t t
5 1.00
--
0 1 w w
1 2 e e
2 3 ' '
3 4 l l
4 5 l l
5 2.00
$ lt-print sample.bin
Error: empty set of final states
0 1 ε ε 0.000000
0 2 ε ε 0.000000
--
0 1 ε ε 0.000000
0 7 ε ε 0.000000
1 2 i i 0.000000
2 3 s s 0.000000
3 4 n n 0.000000
4 5 ' ' 0.000000
5 6 t t 0.000000
6 13 ε ε 1.000000
7 8 w w 0.000000
8 9 e e 0.000000
9 10 ' ' 0.000000
10 11 l l 0.000000
11 12 l l 0.000000
12 13 ε ε 2.000000
13 0.000000
I am not sure why is the final@conditional transducer extracted!
Additionally, the two epsilon transitions are strange and doesn't represent actual transitions.
Seems like it's a bug in the lt-comp
command.
I believe my patch isn't the source of this bug.
Additionally, the patch will fail for negative state ids. I can open another issue so that we don't forget this current limitation and fix it later.
References #59. Fixes #56.