dhfbk / tint

The Italian NLP Tool
http://tint.fbk.eu
GNU General Public License v3.0
70 stars 9 forks source link

StringIndexOutOfBoundsException triggered by a single character 'c' #26

Closed vicawil closed 6 years ago

vicawil commented 6 years ago

I came across this issue when processing text with some noise (inline XML-like tags, sometimes inside tokens), where a 'c' was separated from a token by a punctuation character. The most basic way to trigger this exception is to pass the single character "c".

EXECUTION FAILED: String index out of range: -2
java.lang.StringIndexOutOfBoundsException: String index out of range: -2
    at java.lang.String.substring(String.java:1967)
    at eu.fbk.dh.tint.digimorph.annotator.GuessModel.guess(GuessModel.java:321)
    at eu.fbk.dh.tint.digimorph.annotator.DigiLemmaAnnotator.annotate(DigiLemmaAnnotator.java:212)
    at edu.stanford.nlp.pipeline.AnnotationPipeline.annotate(AnnotationPipeline.java:76)
    at edu.stanford.nlp.pipeline.StanfordCoreNLP.annotate(StanfordCoreNLP.java:599)
    at eu.fbk.dh.tint.runner.TintPipeline.runRaw(TintPipeline.java:119)
    at eu.fbk.dh.tint.runner.TintPipeline.run(TintPipeline.java:128)
    at eu.fbk.dh.tint.runner.TintPipeline.run(TintPipeline.java:179)
    at eu.fbk.dh.tint.runner.TintRunner.main(TintRunner.java:124)
ziorufus commented 6 years ago

This works, too. I'm using the develop branch.

String text = "c";
Annotation annotation = pipeline.runRaw(text);
String json = JSONOutputter.jsonPrint(annotation);
System.out.println(json);
vicawil commented 6 years ago

Oops, correct, apparently only the stable version is affected.