dkpro / dkpro-core

Collection of software components for natural language processing (NLP) based on the Apache UIMA framework.
https://dkpro.github.io/dkpro-core
Other
196 stars 67 forks source link

Constituent mapping for NEGRA tagset #486

Closed reckart closed 9 years ago

reckart commented 9 years ago
Constituent mapping for NEGRA tagset.

Original issue reported on code.google.com by richard.eckart on 2014-10-02 14:41:33

reckart commented 9 years ago
This issue was updated by revision r2881.

- Added some more mappings

Original issue reported on code.google.com by richard.eckart on 2014-10-04 22:23:23

reckart commented 9 years ago
I'll resolve this for the time being. If anybody wants to review the (incomplete) mappings
before 1.7.0 - please do so. Otherwise, we can have a new issue for the next version
if desired.

Original issue reported on code.google.com by richard.eckart on 2014-11-12 08:42:59

reckart commented 9 years ago
is there any documentation about the DKPro Core types in Constituency?
(e.g. where does SBAR, SINV etc come from)
I did not find any, also not in the DKPro Core book

otherwise I would volunteer to review the mapping - had a brief look and it seems to
be a mapping where a lot of useful information is lost

Original issue reported on code.google.com by eckle.kohler on 2014-11-12 09:55:40

reckart commented 9 years ago
Afaik the DKPro Core constituent types correspond to the Penn Treebank constituent types
as produced by the Stanford parser. 

See: http://bulba.sdsu.edu/jeanette/thesis/PennTags.html
See: https://dkpro-core-asl.googlecode.com/svn/de.tudarmstadt.ukp.dkpro.core-asl/trunk/de.tudarmstadt.ukp.dkpro.core.api.syntax-asl/src/main/resources/de/tudarmstadt/ukp/dkpro/core/api/syntax/tagset/en-ptb-constituency.map

Issue 516 (formerly issue 99) contains some discussion about adopting more "universal"
tags, but this hasn't proceeded since.

Original issue reported on code.google.com by richard.eckart on 2014-11-12 10:08:14

reckart commented 9 years ago
this is broken:  http://bulba.sdsu.edu/jeanette/thesis/PennTags.html

used instead:
http://www.sfs.uni-tuebingen.de/~dm/07/autumn/795.10/ptb-annotation-guide/root.html

Original issue reported on code.google.com by eckle.kohler on 2014-11-12 10:26:52

reckart commented 9 years ago
better overview, less detail:
http://www.surdeanu.info/mihai/teaching/ista555-fall13/readings/PennTreebankConstituents.html

Original issue reported on code.google.com by eckle.kohler on 2014-11-12 10:28:15

reckart commented 9 years ago
I reviewd the mapping.
The mapping is designed to loose information - which is due to incompatible tagsets
(Penn tagset not making distinctions that are present in Negra)

I think, Issue 516 (formerly issue 99) is not related because it is about dependency
types - this issue here is about constituent types.

Original issue reported on code.google.com by eckle.kohler on 2014-11-12 13:21:43

reckart commented 9 years ago
Issue 517 is about dependencies.
Issue 516 is about constituents.

Although I might have made a bad job at separating the comments from issue 99 into
the two new issues - there is also some overlap.

Original issue reported on code.google.com by richard.eckart on 2014-11-12 14:20:02

reckart commented 9 years ago
This issue was updated by revision r3008.

- Added/changed 5 mappings based on expert-feedback from JEK

Original issue reported on code.google.com by richard.eckart on 2014-11-12 14:42:31

reckart commented 9 years ago
I've just created a 1.7.x branch. Should r3008 be merged into it?

Original issue reported on code.google.com by pedrobssantos on 2014-11-12 14:48:05

reckart commented 9 years ago
Yes, please ;)

Original issue reported on code.google.com by richard.eckart on 2014-11-12 16:42:53

reckart commented 9 years ago
(No text was entered with this change)

Original issue reported on code.google.com by richard.eckart on 2014-11-12 16:52:10

reckart commented 9 years ago
(No text was entered with this change)

Original issue reported on code.google.com by pedrobssantos on 2014-11-12 16:57:28