From Djame:
Hi Grzegorz,
I've been able to reproduce this bug which occurs in two cases
1° there're less than 2 fields of data per line (typical case : a punct which
lacks a lemma in a treebank (PONCT .) instead of (PONCT .@.) so if one rewrites
the leaf to get word lemma pos, there will be only tw o fields and bang.
2° there're more than 3 fields (typical case : the (X (SYM @)) line in the
PTB which is lemmatized @^@ but as a scriot which works for french and italian
(tr "@" "^" | tr '^' '\t') will generate 4 fields
@^@ SYM -> ^^^ SYM > \t\t\tSYM and bang morfette crashes (one night it took
me to catch on my own data)
Solutions:
1) the best : make morfette more explicit (like display the faulting line and
some context)
2) run a checker script
http://pauillac.inria.fr/~seddah/check.pl
Original issue reported on code.google.com by pitekus on 19 Dec 2011 at 3:26
Original issue reported on code.google.com by
pitekus
on 19 Dec 2011 at 3:26