dkpro / dkpro-core

Collection of software components for natural language processing (NLP) based on the Apache UIMA framework.
https://dkpro.github.io/dkpro-core
Other
196 stars 67 forks source link

Universal phrase tag set (consituents) #268

Open reckart opened 9 years ago

reckart commented 9 years ago
This might be something for coarse grained tags on constituents:

https://github.com/aaronlifenghan/A-Universal-Phrase-Tagset

Original issue reported on code.google.com by richard.eckart on 2013-10-23 17:39:30

reckart commented 9 years ago
most of it looks good and will work for German (and maybe other languages - has to be
investigated) as well;

However, I find the naming of these two tags not well motivated:

CONJP (conjunction phrase), COP (coordinated phrase)

and the mapping they defined does not make sense to me:

PTB CONJP -> CONJP

PTB UCP -> COP (UCP stands for unlike coordination, i.e. conjuncts are not of the same
syntactic category)
French Treebank COORD -> COP

see slides of the paper: http://de.slideshare.net/AaronHanLiFeng/pptphrase-tagset-mapping-for-french-and-english-treebanks-and-its-application-in-machine-translation-evaluation

background on coordination:
http://faculty.washington.edu/fxia/LAWVI/workshop_presentation_slides/paper_session5/coordination.pdf

they define coordination as
"A syntactic structure consisting of two or more elements
(conjuncts), with one or more conjuncts typically, but not
always preceded by a coordinating conjunction"

-- Judith

Original issue reported on code.google.com by eckle.kohler on 2013-10-23 18:47:31

reckart commented 9 years ago
If the idea of universal tags catches on, there'll probably be more work on this. We
can collect further references here until we feel it is a good time to act. In any
case, we could also just define our own categories. After all, we're already collecting
tag set definitions for various tag sets which we could compare at some point. However,
I don't see us doing real evaluation of the kind of impact such categories could have,
as e.g. Petrov et al did it for the POS tags.

Original issue reported on code.google.com by richard.eckart on 2013-10-23 18:51:28

reckart commented 9 years ago
Dear Judith,

the address shall be :
http://de.slideshare.net/AaronHanLiFeng/pptphrase-tagset-mapping-for-french-and-english-treebanks-and-its-application-in-machine-translation-evaluation-41998520

instead of :
http://de.slideshare.net/AaronHanLiFeng/pptphrase-tagset-mapping-for-french-and-english-treebanks-and-its-application-in-machine-translation-evaluation

Best regards.
NLPer

Original issue reported on code.google.com by hanlifengaaron on 2014-11-25 12:46:12

reckart commented 9 years ago
A Universal Phrase Tagset for Multilingual Treebanks
Springer
October 20, 2014
Many syntactic treebanks and parser toolkits are developed in the past twenty years,
including dependency structure parsers and phrase structure parsers. For the phrase
structure parsers, they usually utilize different phrase tagsets for different languages,
which results in an inconvenience when conducting the multilingual research. This paper
designs a refined universal phrase tagset that contains 9 commonly used phrase categories.
Furthermore, the mapping covers 25 constituent treebanks and 21 languages. The experiments
show that the universal phrase tagset can generally reduce the costs in the parsing
models and even improve the parsing accuracy.

In M. Sun et al. (Eds.): CCL and NLP-NABD 2014, LNAI 8801, pp. 247–258, 2014.
© Springer International Publishing Switzerland 2014

http://link.springer.com/chapter/10.1007%2F978-3-319-12277-9_22

Original issue reported on code.google.com by richard.eckart on 2015-01-16 21:50:39