dkpro / dkpro-core

Collection of software components for natural language processing (NLP) based on the Apache UIMA framework.
https://dkpro.github.io/dkpro-core
Other
196 stars 67 forks source link

Use "universal" categories for dependencies #517

Open reckart opened 9 years ago

reckart commented 9 years ago
This is a fork of issue 99 where relevant aspects have previously been discussed. Here
some relevant comments.

---

#1 richard.eckart

I have added mapping files for NEGRA grammatical functions and CONLL-2008 dependency
labels. Judith has added some suggestions on how these could be mapped to the coarse-grained
grammatical functions used in Uby.

Unfortunately, I didn't find documentation on the grammatical functions used in the
different versions of Tiger. Tiger seems to use a superset of the NEGRA labels and
the set appears to differ between the different versions of Tiger. Does anybody have
links to publications or documentation that specifies the Tiger labels or is it necessary
to extract them directly from the corpus meta data?

---

#2 eckle.kohler

I found the following documentation about TIGER:
(source http://www.ims.uni-stuttgart.de/forschung/ressourcen/korpora/tiger.html)

Annotation Manual: http://www.ims.uni-stuttgart.de/forschung/ressourcen/korpora/TIGERCorpus/annotation/tiger_scheme-syntax.pdf

Reference paper: http://link.springer.com/content/pdf/10.1007%2Fs11168-004-7431-3.pdf

---

#3 eckle.kohler

Differences between NEGRA and TIGER regarding syntactic (=grammatical) functions:
TIGER introduces additional syntactic functions:
- OP for
prepositional objects, i.e. prepositional phrases that are arguments (incontrast to
adjuncts) of verbs, nouns or adjectives
-  PH and EP for different functions of expletive "es" in German

---

#5 eckle.kohler

(met with Sandra Kübler)

MappingProvider for dependencies would be
1) definitely controversial, because in contrast to POS, there is much less aggreement
about common dependency types across languages

2)would have to be carefully designed, because (obviously) it entails information loss.
So this would IMHO only make sense in the context of particular applications where
you can demonstrate, that the information loss does not matter and the generalization
is beneficial for the application

---

#6 richard.eckart

regarding 2): it doesn't entail information loss per se, because the original dependency
information is perserved in a feature value. The mapping only applies for selecting
a specialized type instead of the generic "Dependency" type.

---

#7 eckle.kohler

the CoNLL-2009 Shared Task:

Syntactic and Semantic Dependencies in Multiple Languages could be useful in this context.
see:
http://www.aclweb.org/anthology-new/W/W09/W09-1201.pdf
"we have prepared a unified format and data for
several very different lanaguages, as a basis
for possible extensions towards other languages
and unified treatment of syntactic depenndecies
and semantic role labeling across natural lan-
guages;"

have to look into the data, though

---

#8 richard.eckart
I believe they just unified the file format, not the tagset. We have readers and writers
for the file format btw (io.conll).

---

#9 richard.eckart
The Swedish Treebank provides an official conversion to the Stanford dependency types:
http://stp.lingfil.uu.se/~nivre/swedish_treebank/

---

#11 richard.eckart

This looks very interesting:

http://www.ryanmcd.com/papers/treebanksACL2013.pdf
https://code.google.com/p/uni-dep-tb/

Original issue reported on code.google.com by richard.eckart on 2014-11-12 10:05:57

reckart commented 9 years ago
New (2015-01-15): Download the version 1.0 treebanks from the LINDAT/CLARIN repository:
http://hdl.handle.net/11234/1-1464

Source: http://universaldependencies.github.io/docs/

Original issue reported on code.google.com by richard.eckart on 2015-01-20 19:04:09

judithek commented 9 years ago

Universal Stanford Dependencies are described in this LREC2014 paper: http://nlp.stanford.edu/pubs/USD_LREC14_paper_camera_ready.pdf

a mapping of German dependency types might be very useful

reckart commented 9 years ago

The latest CoreNLP versions also produce universal dependencies for English (and "old" dependencies only optional).

I didn't look into it in detail, but the mapping of "old" Stanford dependencies to "new" dependencies might be non-trivial as possibly some edges need to be split/conflated. See also: http://mailman.uib.no/public/corpora/2015-September/023188.html