dice-group / LIMES

Link Discovery Framework for Metric Spaces.
https://limes.demos.dice-research.org/
GNU Affero General Public License v3.0
126 stars 54 forks source link

StringIndexOutOfBoundsException #236

Closed KonradHoeffner closed 4 years ago

KonradHoeffner commented 4 years ago

Running LIMES results in the following error:

Exception in thread "main" java.lang.StringIndexOutOfBoundsException: begin 0, end -1, length 27
    at java.base/java.lang.String.checkBoundsBeginEnd(String.java:3756)
    at java.base/java.lang.String.substring(String.java:1902)
    at org.aksw.limes.core.io.parser.Parser.getTerms(Parser.java:177)
    at org.aksw.limes.core.io.parser.Parser.<init>(Parser.java:40)
    at org.aksw.limes.core.io.ls.LinkSpecification.readSpec(LinkSpecification.java:187)
    at org.aksw.limes.core.io.ls.LinkSpecification.<init>(LinkSpecification.java:76)
    at org.aksw.limes.core.controller.LSPipeline.execute(LSPipeline.java:51)
    at org.aksw.limes.core.controller.Controller.getMapping(Controller.java:214)
    at org.aksw.limes.core.controller.Controller.getMapping(Controller.java:177)
    at org.aksw.limes.core.controller.Controller.main(Controller.java:87)

Environment LIMES started via java -Xmx10G -jar ~/opt/limes/limes-core/target/limes-core-1.7.4-SNAPSHOT.jar, master branch version 1.7.4-snapshot, commit ae81ba402c67e89ceb23f8cb872b01f5a5e25419. OpenJDK 14 on Arch Linux.

Full Log

$ limes snik-dbpedia.xml 
2020-07-02 16:11:36,412 main INFO Log4j appears to be running in a Servlet environment, but there's no log4j-web module available. If you want better web container support, please add the log4j-web JAR to your web archive or server lib directory.
16:11:36.496 [main] [] INFO  org.aksw.limes.core.io.cache.HybridCache:115 - Checking for file /home/konrad/projekte/snik/ontology/limes/dbpedia/cache/-1099036643.ser
16:11:36.502 [main] [] INFO  org.aksw.limes.core.io.cache.HybridCache:118 - Found cached data. Loading data from file /home/konrad/projekte/snik/ontology/limes/dbpedia/cache/-1099036643.ser
16:11:36.526 [main] [] INFO  org.aksw.limes.core.io.cache.HybridCache:124 - Cached data loaded successfully from file /home/konrad/projekte/snik/ontology/limes/dbpedia/cache/-1099036643.ser
16:11:36.526 [main] [] INFO  org.aksw.limes.core.io.cache.HybridCache:125 - Size = 240
16:11:36.526 [main] [] INFO  org.aksw.limes.core.io.cache.HybridCache:115 - Checking for file /home/konrad/projekte/snik/ontology/limes/dbpedia/cache/-1467538100.ser
16:11:36.526 [main] [] INFO  org.aksw.limes.core.io.cache.HybridCache:132 - No cached data found for dbpedia
16:11:36.527 [main] [] INFO  org.aksw.limes.core.io.query.QueryModuleFactory:18 - Generating <TURTLE> reader
Trying to get reader TURTLE
16:11:43.108 [main] [] INFO  org.aksw.limes.core.io.query.FileQueryModule:55 - RDF model read from dbpedia-ascii-1m.ttl is of size 1000000
16:11:43.108 [main] [] INFO  LIMES:32 - Registry = [dbpedia-ascii-1m.ttl]
16:11:43.109 [main] [] INFO  org.aksw.limes.core.io.query.SparqlQueryModule:251 - Query issued is 
PREFIX bb: <http://www.snik.eu/ontology/bb/>
PREFIX ob: <http://www.snik.eu/ontology/ob/>
PREFIX rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#>
PREFIX owl: <http://www.w3.org/2002/07/owl#>
PREFIX meta: <http://www.snik.eu/ontology/meta/>
PREFIX skos: <http://www.w3.org/2004/02/skos/core#>
PREFIX rdfs: <http://www.w3.org/2000/01/rdf-schema#>
SELECT DISTINCT ?dbpedia ?v1
WHERE {
?dbpedia rdfs:label ?v1 .
}
16:11:43.109 [main] [] INFO  org.aksw.limes.core.io.query.SparqlQueryModule:57 - Querying the endpoint.
16:11:43.109 [main] [] INFO  org.aksw.limes.core.io.query.SparqlQueryModule:72 - Getting statements 0 to -1
16:11:48.250 [main] [] INFO  org.aksw.limes.core.io.query.SparqlQueryModule:158 - Retrieved 1000000 triples and 1000000 entities.
16:11:48.250 [main] [] INFO  org.aksw.limes.core.io.query.SparqlQueryModule:159 - Retrieving statements took 5.141 seconds.
16:11:48.251 [main] [] INFO  org.aksw.limes.core.io.cache.HybridCache:316 - Serializing 1000000 objects to /home/konrad/projekte/snik/ontology/limes/dbpedia/cache/-1467538100.ser
Exception in thread "main" java.lang.StringIndexOutOfBoundsException: begin 0, end -1, length 27
    at java.base/java.lang.String.checkBoundsBeginEnd(String.java:3756)
    at java.base/java.lang.String.substring(String.java:1902)
    at org.aksw.limes.core.io.parser.Parser.getTerms(Parser.java:177)
    at org.aksw.limes.core.io.parser.Parser.<init>(Parser.java:40)
    at org.aksw.limes.core.io.ls.LinkSpecification.readSpec(LinkSpecification.java:187)
    at org.aksw.limes.core.io.ls.LinkSpecification.<init>(LinkSpecification.java:76)
    at org.aksw.limes.core.controller.LSPipeline.execute(LSPipeline.java:51)
    at org.aksw.limes.core.controller.Controller.getMapping(Controller.java:214)
    at org.aksw.limes.core.controller.Controller.getMapping(Controller.java:177)
    at org.aksw.limes.core.controller.Controller.main(Controller.java:87)
KonradHoeffner commented 4 years ago

Configuration and source files: snik-dbpedia.zip

KonradHoeffner commented 4 years ago

The error seems to only occurr when using OR as in <METRIC>OR(exactMatch(x.label,y.label),exactMatch(x.altLabel,y.label))</METRIC>. It works when using <METRIC>exactMatch(snik.label,dbpedia.label)</METRIC>, at least until now.

kvndrsslr commented 4 years ago

You need to specify thresholds for all subexpressions within operators: For example <METRIC>OR(exactMatch(x.label,y.label),exactMatch(x.altLabel,y.label))</METRIC> should be <METRIC>OR(exactMatch(x.label,y.label)|0.8 ,exactMatch(x.altLabel,y.label)|0.8)</METRIC> if you want to use 0.8 as threshold.

KonradHoeffner commented 4 years ago

Thanks for the clarification! However I think it would be good to have an error message that is more helpful in that case. For example "ThresholdMissingException: you need to specify thresholds for all subexpressions within operators". P.S.: Does 0.8 make sense with exact match? I thought this is only ever 0 and 1.