Alpino expects every sentence of tokenized input to be on a separate line (Alpino User Guide, Section 2.5 on page 7).
mmax2raw.py, however, totally ignores this and outputs all words of a MMAX file on a single line of the output file:
Fixing this could improve tagging and therefore coreference results.
Alpino also sometimes seems to output cyclic graphs as dependency-"trees". Maybe this is also caused by this issue.
Alpino expects every sentence of tokenized input to be on a separate line (Alpino User Guide, Section 2.5 on page 7). mmax2raw.py, however, totally ignores this and outputs all words of a MMAX file on a single line of the output file:
https://github.com/cltl/FormatConversions/blob/6810be2584b193fbf6624850dfe90371f79e1649/mmax2conll/mmax2raw.py#L81
Fixing this could improve tagging and therefore coreference results. Alpino also sometimes seems to output cyclic graphs as dependency-"trees". Maybe this is also caused by this issue.