dkpro / dkpro-core

Collection of software components for natural language processing (NLP) based on the Apache UIMA framework.
https://dkpro.github.io/dkpro-core
Other
196 stars 67 forks source link

BratReader crashes when an annotation covers more than two spans of text #1378

Closed alaindesilets closed 5 years ago

alaindesilets commented 5 years ago

Use the attached Maven project to reproduce the bug: TestBratNewlineBug.zip

Run the ReadBrat program, feeding it the path to the docs directory (als on contained in the Maven project).

You will then get the following exception:

Exception in thread "main" java.lang.IndexOutOfBoundsException: Index: 1, Size: 1
    at java.util.ArrayList.rangeCheck(ArrayList.java:653)
    at java.util.ArrayList.get(ArrayList.java:429)
    at de.tudarmstadt.ukp.dkpro.core.io.brat.internal.model.BratTextAnnotation.generateOffsetsString(BratTextAnnotation.java:132)
    at de.tudarmstadt.ukp.dkpro.core.io.brat.internal.model.BratTextAnnotation.parse(BratTextAnnotation.java:152)
    at de.tudarmstadt.ukp.dkpro.core.io.brat.internal.model.BratAnnotationDocument.read(BratAnnotationDocument.java:54)
    at de.tudarmstadt.ukp.dkpro.core.io.brat.BratReader.readAnnotations(BratReader.java:192)
    at de.tudarmstadt.ukp.dkpro.core.io.brat.BratReader.getNext(BratReader.java:184)
    at de.tudarmstadt.ukp.dkpro.core.api.io.JCasResourceCollectionReader_ImplBase.getNext(JCasResourceCollectionReader_ImplBase.java:41)
    at org.apache.uima.fit.pipeline.SimplePipeline.runPipeline(SimplePipeline.java:100)
    at ReadBrat.main(ReadBrat.java:24)

This bug has been found to occur on both OSX and Windows.

The problem occurs on line 132 in BratTextAnnotation. It references offsetsList.get(i - 1), but such an index doesn't exist since nothing was added to offsetsList in the previous loop.

The solution might be to replace

offsetsList.get(i - 1).setEnd(effectiveEnd);

with

offsetsList.get(offsetsList.size() - 1).setEnd(effectiveEnd);

We have tested this fix on a cloned version of dkpro-core and it seems to do the trick.

reckart commented 5 years ago

@alaindesilets would you like to do a PR? (see contribution guidelines)

alaindesilets commented 5 years ago

Sure. I'll see about getting a contribution agreement in place from my employer.

On Thu, Jun 6, 2019 at 2:43 AM Richard Eckart de Castilho < notifications@github.com> wrote:

@alaindesilets https://github.com/alaindesilets would you like to do a PR? (see contribution guidelines https://dkpro.github.io/contributing/)

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/dkpro/dkpro-core/issues/1378?email_source=notifications&email_token=AAIMA4BREXOP2N6PNWIRMNLPZCWZFA5CNFSM4HT2BFF2YY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGODXB4RDA#issuecomment-499370124, or mute the thread https://github.com/notifications/unsubscribe-auth/AAIMA4EWCRKJL3AQEQDPHOTPZCWZFANCNFSM4HT2BFFQ .

alaindesilets commented 5 years ago

Hum... it sounds like it would take weeks to have the corporate contribution agreement signed by the NRC president. Probably best if you guys do the change yourselves.

On Thu, Jun 6, 2019 at 9:05 AM Alain Désilets alaindesilets0@gmail.com wrote:

Sure. I'll see about getting a contribution agreement in place from my employer.

On Thu, Jun 6, 2019 at 2:43 AM Richard Eckart de Castilho < notifications@github.com> wrote:

@alaindesilets https://github.com/alaindesilets would you like to do a PR? (see contribution guidelines https://dkpro.github.io/contributing/)

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/dkpro/dkpro-core/issues/1378?email_source=notifications&email_token=AAIMA4BREXOP2N6PNWIRMNLPZCWZFA5CNFSM4HT2BFF2YY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGODXB4RDA#issuecomment-499370124, or mute the thread https://github.com/notifications/unsubscribe-auth/AAIMA4EWCRKJL3AQEQDPHOTPZCWZFANCNFSM4HT2BFFQ .