codeaudit / dkpro-core-asl

Automatically exported from code.google.com/p/dkpro-core-asl
0 stars 0 forks source link

Use MappingProvider for constituents and dependencies #99

Open GoogleCodeExporter opened 9 years ago

GoogleCodeExporter commented 9 years ago
Consituent and dependency types are still mapped based on class and 
packagename. Use a mapping provider as done for POS tags.

Original issue reported on code.google.com by richard.eckart on 8 Nov 2012 at 4:42

GoogleCodeExporter commented 9 years ago
I have added mapping files for NEGRA grammatical functions and CONLL-2008 
dependency labels. Judith has added some suggestions on how these could be 
mapped to the coarse-grained grammatical functions used in Uby.

Unfortunately, I didn't find documentation on the grammatical functions used in 
the different versions of Tiger. Tiger seems to use a superset of the NEGRA 
labels and the set appears to differ between the different versions of Tiger. 
Does anybody have links to publications or documentation that specifies the 
Tiger labels or is it necessary to extract them directly from the corpus meta 
data?

Original comment by richard.eckart on 14 Aug 2013 at 9:19

GoogleCodeExporter commented 9 years ago
I found the following documentation about TIGER:
(source http://www.ims.uni-stuttgart.de/forschung/ressourcen/korpora/tiger.html)

Annotation Manual: 
http://www.ims.uni-stuttgart.de/forschung/ressourcen/korpora/TIGERCorpus/annotat
ion/tiger_scheme-syntax.pdf

Reference paper: 
http://link.springer.com/content/pdf/10.1007%2Fs11168-004-7431-3.pdf

Original comment by eckle.kohler on 14 Aug 2013 at 1:22

GoogleCodeExporter commented 9 years ago
Differences between NEGRA and TIGER regarding syntactic (=grammatical) 
functions:
TIGER introduces additional syntactic functions:
- OP for
prepositional objects, i.e. prepositional phrases that are arguments 
(incontrast to adjuncts) of verbs, nouns or adjectives
-  PH and EP for different functions of expletive "es" in German

Original comment by eckle.kohler on 15 Aug 2013 at 6:11

GoogleCodeExporter commented 9 years ago

Original comment by richard.eckart on 15 Aug 2013 at 9:51

GoogleCodeExporter commented 9 years ago
(met with Sandra Kübler)

MappingProvider for dependencies would be
1) definitely controversial, because in contrast to POS, there is much less 
aggreement about common dependency types across languages

2)would have to be carefully designed, because (obviously) it entails 
information loss. So this would IMHO only make sense in the context of 
particular applications where you can demonstrate, that the information loss 
does not matter and the generalization is beneficial for the application

Original comment by eckle.kohler on 8 Sep 2013 at 5:26

GoogleCodeExporter commented 9 years ago
regarding 2): it doesn't entail information loss per se, because the original 
dependency information is perserved in a feature value. The mapping only 
applies for selecting a specialized type instead of the generic "Dependency" 
type.

Original comment by richard.eckart on 8 Sep 2013 at 9:15

GoogleCodeExporter commented 9 years ago
the CoNLL-2009 Shared Task:

Syntactic and Semantic Dependencies in Multiple Languages could be useful in 
this context.
see:
http://www.aclweb.org/anthology-new/W/W09/W09-1201.pdf
"we have prepared a unified format and data for
several very different lanaguages, as a basis
for possible extensions towards other languages
and unified treatment of syntactic depenndecies
and semantic role labeling across natural lan-
guages;"

have to look into the data, though

Original comment by eckle.kohler on 13 Sep 2013 at 6:20

GoogleCodeExporter commented 9 years ago
I believe they just unified the file format, not the tagset. We have readers 
and writers for the file format btw (io.conll).

Original comment by richard.eckart on 13 Sep 2013 at 6:31

GoogleCodeExporter commented 9 years ago
The Swedish Treebank provides an official conversion to the Stanford dependency 
types: http://stp.lingfil.uu.se/~nivre/swedish_treebank/

Original comment by richard.eckart on 15 Sep 2013 at 5:29

GoogleCodeExporter commented 9 years ago

Original comment by richard.eckart on 17 Sep 2013 at 2:40

GoogleCodeExporter commented 9 years ago
This looks very interesting:

http://www.ryanmcd.com/papers/treebanksACL2013.pdf
https://code.google.com/p/uni-dep-tb/

Original comment by richard.eckart on 19 Sep 2013 at 1:34

GoogleCodeExporter commented 9 years ago
The harmonized label set in 

http://www.ryanmcd.com/papers/treebanksACL2013.pdf

looks good. This label set is based on "the principle that content words take 
function words as dependents".

We could use it to create mappings for German and other languages where we have 
dependency parsers integrated. The question is, how straightforward it is for 
languages other than English to map the existing dependency tagsets to this 
uniform label set. 
This requires looking into the individual dependency tagsets used in the 
different treebanks.

Original comment by eckle.kohler on 19 Sep 2013 at 2:51

GoogleCodeExporter commented 9 years ago
for future reference:

the paper about the Italian Stanford Dependency Treebank
http://medialab.di.unipi.it/downloads/ISDT/MIDT-STD2013_law.pdf

Original comment by eckle.kohler on 19 Sep 2013 at 4:28

GoogleCodeExporter commented 9 years ago

Original comment by richard.eckart on 26 Mar 2014 at 10:51

GoogleCodeExporter commented 9 years ago
Still not all components use the providers. Moving ahead again.

Original comment by richard.eckart on 12 Nov 2014 at 8:47