google-code-export / dkpro-core-asl

Automatically exported from code.google.com/p/dkpro-core-asl
0 stars 0 forks source link

Use "universal" categories for syntactic dependencies #517

Open GoogleCodeExporter opened 9 years ago

GoogleCodeExporter commented 9 years ago
This is a fork of issue 99 where relevant aspects have previously been 
discussed. Here some relevant comments.

---

#1 richard.eckart

I have added mapping files for NEGRA grammatical functions and CONLL-2008 
dependency labels. Judith has added some suggestions on how these could be 
mapped to the coarse-grained grammatical functions used in Uby.

Unfortunately, I didn't find documentation on the grammatical functions used in 
the different versions of Tiger. Tiger seems to use a superset of the NEGRA 
labels and the set appears to differ between the different versions of Tiger. 
Does anybody have links to publications or documentation that specifies the 
Tiger labels or is it necessary to extract them directly from the corpus meta 
data?

---

#2 eckle.kohler

I found the following documentation about TIGER:
(source http://www.ims.uni-stuttgart.de/forschung/ressourcen/korpora/tiger.html)

Annotation Manual: 
http://www.ims.uni-stuttgart.de/forschung/ressourcen/korpora/TIGERCorpus/annotat
ion/tiger_scheme-syntax.pdf

Reference paper: 
http://link.springer.com/content/pdf/10.1007%2Fs11168-004-7431-3.pdf

---

#3 eckle.kohler

Differences between NEGRA and TIGER regarding syntactic (=grammatical) 
functions:
TIGER introduces additional syntactic functions:
- OP for
prepositional objects, i.e. prepositional phrases that are arguments 
(incontrast to adjuncts) of verbs, nouns or adjectives
-  PH and EP for different functions of expletive "es" in German

---

#5 eckle.kohler

(met with Sandra Kübler)

MappingProvider for dependencies would be
1) definitely controversial, because in contrast to POS, there is much less 
aggreement about common dependency types across languages

2)would have to be carefully designed, because (obviously) it entails 
information loss. So this would IMHO only make sense in the context of 
particular applications where you can demonstrate, that the information loss 
does not matter and the generalization is beneficial for the application

---

#6 richard.eckart

regarding 2): it doesn't entail information loss per se, because the original 
dependency information is perserved in a feature value. The mapping only 
applies for selecting a specialized type instead of the generic "Dependency" 
type.

---

#7 eckle.kohler

the CoNLL-2009 Shared Task:

Syntactic and Semantic Dependencies in Multiple Languages could be useful in 
this context.
see:
http://www.aclweb.org/anthology-new/W/W09/W09-1201.pdf
"we have prepared a unified format and data for
several very different lanaguages, as a basis
for possible extensions towards other languages
and unified treatment of syntactic depenndecies
and semantic role labeling across natural lan-
guages;"

have to look into the data, though

---

#8 richard.eckart
I believe they just unified the file format, not the tagset. We have readers 
and writers for the file format btw (io.conll).

---

#9 richard.eckart
The Swedish Treebank provides an official conversion to the Stanford dependency 
types: http://stp.lingfil.uu.se/~nivre/swedish_treebank/

---

#11 richard.eckart

This looks very interesting:

http://www.ryanmcd.com/papers/treebanksACL2013.pdf
https://code.google.com/p/uni-dep-tb/

Original issue reported on code.google.com by richard.eckart on 12 Nov 2014 at 10:05

GoogleCodeExporter commented 9 years ago
New (2015-01-15): Download the version 1.0 treebanks from the LINDAT/CLARIN 
repository:
http://hdl.handle.net/11234/1-1464

Source: http://universaldependencies.github.io/docs/

Original comment by richard.eckart on 20 Jan 2015 at 7:04