dbpedia-spotlight / dbpedia-spotlight-model

DBpedia Spotlight is a tool for automatically annotating mentions of DBpedia resources in text. Improving Efficiency and Accuracy in Multilingual Entity Extraction approach
http://www.dbpedia-spotlight.org
Apache License 2.0
178 stars 43 forks source link

Annotation fails with String index out of range: 4 #73

Open acxcv opened 1 year ago

acxcv commented 1 year ago

Hi, I'm using Spotlight to annotate ~40k texts.

In around 3.5k instances, the annotation does not work as expected and Spotlight produces String index out of range: 4 instead of the annotation XML.

I can't find the reason why this happens. From what I can tell, the texts where Spotlight fails are of similar length and structure as those that work flawlessly.

I've tried removing all non-alphanumeric characters from sample texts that failed, but the error still persists.

This is the last shell output I'm getting on the REST server before the CURL command returns the error.

``` ...] 492713 [Grizzly-2222(5)] INFO org.dbpedia.spotlight.filter.annotations.ConfidenceFilter - (c=0.45) filtered out by similarity score threshold (0.000<0.450): SurfaceForm[Black] -0.000-> DBpediaResource[Black(DBpedia:Colour)] - at position *7371* in - Text[... rres management told them that if they played the Black Angels Death Song again theyd be fired the V ...] 492713 [Grizzly-2222(5)] INFO org.dbpedia.spotlight.filter.annotations.ConfidenceFilter - (c=0.45) filtered out by similarity score threshold (0.000<0.450): SurfaceForm[Black] -0.000-> DBpediaResource[Black_Canadians(Wikidata:Q41710,DBpedia:EthnicGroup)] - at position *7371* in - Text[... rres management told them that if they played the Black Angels Death Song again theyd be fired the V ...] ```

Does anybody have an idea why this could happen? I can provide a text file containing the texts in question for reference.

I'm using Java 1.8.0, dbpedia-sporlight-1.0.0 jarfile, latest en core data release

Thanks for your help!

Julio-Noe commented 1 year ago

Hi @acxcv ,

Thanks for sharing this issue. Please, try the Docker version of Spotlight; here is the link:

https://hub.docker.com/r/dbpedia/dbpedia-spotlight

if the problem persists, please share the text file and the last log output to have a clue of where to look for the problem. Thanks again, and have a nice day.