arne-cl / discoursegraphs

linguistic converter / merging tool for multi-level annotated corpora. graph-based (using Python and NetworkX).
BSD 3-Clause "New" or "Revised" License
50 stars 5 forks source link

conll: coreference IDs aren't consistent (mmax->conll vs. tiger/mmax->conll) #94

Open arne-cl opened 10 years ago

arne-cl commented 10 years ago

running make conll produces

/tmp/dg/maz-1423.mmax_only.conll

1   Die _   _   _   _   _   _   _   _   _   _   _   _   (6
2   Diskussion  _   _   _   _   _   _   _   _   _   _   _   _   6)
3   ,   _   _   _   _   _   _   _   _   _   _   _   _   _
4   wie _   _   _   _   _   _   _   _   _   _   _   _   _
5   teuer   _   _   _   _   _   _   _   _   _   _   _   _   _
6   die _   _   _   _   _   _   _   _   _   _   _   _   (1
7   neue    _   _   _   _   _   _   _   _   _   _   _   _   1
8   Wittstocker _   _   _   _   _   _   _   _   _   _   _   _   1|(0)
9   Stadthalle  _   _   _   _   _   _   _   _   _   _   _   _   1)

/tmp/dg/maz-1423.tiger_mmax.conll

1   Die _   _   _   _   _   _   _   _   _   _   _   _   (2
2   Diskussion  _   _   _   _   _   _   _   _   _   _   _   _   2)
3   ,   _   _   _   _   _   _   _   _   _   _   _   _   _
4   wie _   _   _   _   _   _   _   _   _   _   _   _   _
5   teuer   _   _   _   _   _   _   _   _   _   _   _   _   _
6   die _   _   _   _   _   _   _   _   _   _   _   _   (1
7   neue    _   _   _   _   _   _   _   _   _   _   _   _   1
8   Wittstocker _   _   _   _   _   _   _   _   _   _   _   _   1|(0)
9   Stadthalle  _   _   _   _   _   _   _   _   _   _   _   _   1)
arne-cl commented 10 years ago

tried sorted(docgraph.nodes_iter) in select_nodes_by_layer(). did help for 1423, but not for 15734.