Open acxcv opened 1 year ago
Hi @acxcv ,
Thanks for sharing this issue. Please, try the Docker version of Spotlight; here is the link:
https://hub.docker.com/r/dbpedia/dbpedia-spotlight
if the problem persists, please share the text file and the last log output to have a clue of where to look for the problem. Thanks again, and have a nice day.
Hi, I'm using Spotlight to annotate ~40k texts.
In around 3.5k instances, the annotation does not work as expected and Spotlight produces
String index out of range: 4
instead of the annotation XML.I can't find the reason why this happens. From what I can tell, the texts where Spotlight fails are of similar length and structure as those that work flawlessly.
I've tried removing all non-alphanumeric characters from sample texts that failed, but the error still persists.
``` ...] 492713 [Grizzly-2222(5)] INFO org.dbpedia.spotlight.filter.annotations.ConfidenceFilter - (c=0.45) filtered out by similarity score threshold (0.000<0.450): SurfaceForm[Black] -0.000-> DBpediaResource[Black(DBpedia:Colour)] - at position *7371* in - Text[... rres management told them that if they played the Black Angels Death Song again theyd be fired the V ...] 492713 [Grizzly-2222(5)] INFO org.dbpedia.spotlight.filter.annotations.ConfidenceFilter - (c=0.45) filtered out by similarity score threshold (0.000<0.450): SurfaceForm[Black] -0.000-> DBpediaResource[Black_Canadians(Wikidata:Q41710,DBpedia:EthnicGroup)] - at position *7371* in - Text[... rres management told them that if they played the Black Angels Death Song again theyd be fired the V ...] ```This is the last shell output I'm getting on the REST server before the CURL command returns the error.
Does anybody have an idea why this could happen? I can provide a text file containing the texts in question for reference.
I'm using Java 1.8.0, dbpedia-sporlight-1.0.0 jarfile, latest en core data release
Thanks for your help!