UniversalDependencies / tools

Various utilities for processing the data.
GNU General Public License v2.0
203 stars 43 forks source link

Does UD provide conversion tools? #62

Closed BramVanroy closed 3 years ago

BramVanroy commented 4 years ago

I am curious to compare UD models to others on a UD test set. The problem is, of course, that the others' labels are of a different tagset. Does UD provide conversion scripts to convert, for instance, the dependency labels of OntoNotes and Penn? Thanks in advance. (I am aware that conversion scripts will add noise, but I am fine with that.)

akoehn commented 4 years ago

There are conversion scripts; the Hamburg Dependency Treebank for example was converted to UD with an extensive rule set (see https://www.aclweb.org/anthology/W19-8006/). Other treebanks were similarly converted. There is no general conversion tool X -> UD and, at least in our case, the rules are written to cover the phenomena found in the treebank, so the quality for out of domain data will be worse.

amir-zeldes commented 4 years ago

We have a conversion from Stanford Dependencies to Universal Dependencies for English, which optionally takes advantage of additional annotations if available, including entity types (for flat), error annotations (for reparandum, Typo=Yes) and coreference (for dislocated), among other things. The conversion is described and evaluated here:

https://www.aclweb.org/anthology/W18-4918/

manning commented 3 years ago

We also have a conversion from Penn constituency treebanks to UDv2 as part of CoreNLP: source javadoc.