UniversalDependencies / UD_Dutch-Alpino

Dutch data.
Creative Commons Attribution Share Alike 4.0 International
8 stars 1 forks source link

Duplicate declaration of enhanced dependencies #2

Open bguil opened 6 years ago

bguil commented 6 years ago

There are several places where the same enhanced relation is declare twice. For instance:

1   Oranje  Oranje  PROPN   N|eigen|ev|basis|onz|stan   Gender=Neut|Number=Sing 15  nsubj   15:nsubj|15:nsubj   SpaceAfter=No

All occurences (80) can be listed with: egrep "([0-9]+):([a-z:]+)\|(\1):(\2)" *.conllu

NB: the same problem occurs also in the other Dutch corpus UD_Dutch-LassySmall (11 occurences)

gossebouma commented 6 years ago

This is indeed a bug. I fixed it by forcing deprels to be unique. It should be in the dev branch soon. (It is a bit of a brute force solution, but I am copying (modified) regular deps to the enhanced deps AND spread (controlled) subjects across conjuncts etc. Apparently, the rules sometimes produce enhanced deps that were already there in the regular deps. Instead of listing all of these as exceptions, filtering duplicates seems easier.)