GateNLP / gateplugin-Format_Brat

Support for loading/saving brat standoff annotations
GNU Lesser General Public License v3.0
0 stars 2 forks source link

IndexOutOfBounds when exporting Stanford CoreNLP Dependency annotations #1

Closed drgriffis closed 6 years ago

drgriffis commented 6 years ago

I'm trying to export dependency annotations from the Stanford CoreNLP toolkit to BRAT format for web visualization. When including "Dependency" as an Entity type in the BRAT configuration file, an ArrayIndexOutOfBoundsException is thrown in parsing the constructed annotation.

Full trace of relevant exception:

java.lang.ArrayIndexOutOfBoundsException: 2
at gate.creole.brat.annotations.TextBound.<init>(TextBound.java:33)
at gate.creole.brat.annotations.BratAnnotation.parse(BratAnnotation.java:39)
at gate.creole.brat.Annotations.getBratAnnotations(Annotations.java:259)
at gate.creole.brat.BratDocumentExporter.export(BratDocumentExporter.java:60)

Steps to reproduce:

  1. Create new single-document Corpus, with document text
    This document has two sentences. This is the second one.
  2. Load the following Creole plugins:
    • ANNIE (8.5)
    • Format: brat standoff (1.0-SNAPSHOT)
    • Stanford CoreNLP (8.5)
  3. Create Conditional Corpus Pipeline, with the following components (default Runtime Parameters for each PR):
    • Document Reset PR
    • ANNIE Sentence Splitter
    • ANNIE English Tokeniser
    • StanfordParser
  4. Run the pipeline
  5. Create annotations.conf BRAT configuration file with the following contents:
    [entities]
    Sentence
    Token
    Dependency
    [relations]
    [events]
    [attributes]
  6. Attempt to export processed document to brat Standoff Annotations, referencing the annotations.conf configuration file created in Step 5.

It may well be that I'm not building my BRAT configuration file correctly, also. I haven't actually used BRAT before, so that may be the source of the error.

Thanks!

drgriffis commented 6 years ago

Forgot to specify: using GATE Developer 8.5 build 56b50da.

greenwoodma commented 6 years ago

Turns out it was definitely a bug that hit any GATE annotation without a string feature; quite how I'd not caught this before I'm not sure. Anyway it should work now (the plugin is still 1.0-SNAPSHOT so your existing test app will pick up the new changes automatically).

On a related note you might want to swap around the tokenizer and sentence splitter in your app, as the sentence splitter relies on having Token annotations -- if you run the app in your comment you'll notice you get a single sentence at the moment.