dkpro / dkpro-core

Collection of software components for natural language processing (NLP) based on the Apache UIMA framework.
https://dkpro.github.io/dkpro-core
Other
196 stars 67 forks source link

Add support for the GRAF format #110

Closed reckart closed 5 years ago

reckart commented 9 years ago
Add support for the GRAF format (http://www.cs.vassar.edu/~ide/papers/LAW.pdf) using
the ANC UIMA Utils (http://www.anc.org/tools/uima-utils.html).

Original issue reported on code.google.com by richard.eckart on 2013-02-13 22:17:59

reckart commented 9 years ago
Added a GrafWriter component to the module. Leaving the issue open for now, since the
UimaUtils version used is a SNAPSHOT version.

Original issue reported on code.google.com by richard.eckart on 2013-02-13 22:49:08

reckart commented 9 years ago
(No text was entered with this change)

Original issue reported on code.google.com by richard.eckart on 2013-02-21 09:50:59

reckart commented 9 years ago
This issue was updated by revision r1157.

- Added ANC snapshot repo for UimaUtils 3.0.0-SNAPSHOT

Original issue reported on code.google.com by richard.eckart on 2013-03-16 14:18:34

reckart commented 9 years ago
(No text was entered with this change)

Original issue reported on code.google.com by richard.eckart on 2013-06-24 22:45:57

reckart commented 9 years ago
(No text was entered with this change)

Original issue reported on code.google.com by richard.eckart on 2013-06-25 10:50:53

reckart commented 9 years ago
(No text was entered with this change)

Original issue reported on code.google.com by richard.eckart on 2013-06-25 10:54:45

reckart commented 9 years ago
This issue was updated by revision r3328.

- Added skeleton code for a GrafReader (defunct)
- Working towards a round-trip unit test (hackedihack)

Original issue reported on code.google.com by richard.eckart on 2015-02-20 16:53:31

reckart commented 9 years ago
This issue was updated by revision r3341.

- License headers

Original issue reported on code.google.com by richard.eckart on 2015-02-21 17:00:16

reckart commented 9 years ago

The current state of my attempts to create a UIMA Collection Reader and CAS Consumer components for GrAF in DKPro Core:

There are two tests:

The code for the GrafReader also is currently very rough. I'm just trying to get the round-trip working before refining it. It is not meant to work for anything beyond the unit test.

The GrafWriter currently has no support for generating the header.xml file that is required by the GrafReader.

So these are my obstacles are currently:

reckart commented 9 years ago

Briefly checked out against GrAF UIMA Utils 3.1.1-SNAPSHOT - the type names still appear to be truncated e.g. from de.tudarmstadt.ukp.dkpro.core.api.segmentation.type.Token to type.Token which prevents the reimport part in the roundtrip.

reckart commented 9 years ago

@ksuderman , just noticed you are on GitHub as well (let's see if this message reaches you).

ksuderman commented 9 years ago

@reckart yes it did reach me. I will try to find some hours to put towards tracking down the problem with type names being truncated.

reckart commented 5 years ago

Considering that the upstream GRAF Java code is no longer supported and doesn't work anymore with UIMAv3, this issue won't be completed anymore.