SPARQL-Anything / sparql.anything

SPARQL Anything is a system for Semantic Web re-engineering that allows users to ... query anything with SPARQL.
https://sparql-anything.cc/
Apache License 2.0
218 stars 11 forks source link

[html] Invalid local name #31

Open enridaga opened 3 years ago

enridaga commented 3 years ago

An exception occurs with details:

Exception in thread "main" org.apache.jena.shared.InvalidPropertyURIException: http://www.w3.org/1999/xhtml#http:
    at org.apache.jena.rdf.model.impl.PropertyImpl.checkLocalName(PropertyImpl.java:66)
    at org.apache.jena.rdf.model.impl.PropertyImpl.<init>(PropertyImpl.java:55)
    at org.apache.jena.rdf.model.ResourceFactory$Impl.createProperty(ResourceFactory.java:296)
    at org.apache.jena.rdf.model.ResourceFactory.createProperty(ResourceFactory.java:144)
enridaga commented 3 years ago

Another similar case, when scraping a web page:

[main] ERROR com.github.spiceh2020.sparql.anything.engine.FacadeXOpExecutor - An error occurred
java.io.IOException: java.net.URISyntaxException: Illegal character in fragment at index 29: http://www.w3.org/1999/xhtml#"
    at com.github.spiceh2020.sparql.anything.html.HTMLTriplifier.triplify(HTMLTriplifier.java:108)
    at com.github.spiceh2020.sparql.anything.engine.FacadeXOpExecutor.triplify(FacadeXOpExecutor.java:265)
    at com.github.spiceh2020.sparql.anything.engine.FacadeXOpExecutor.getDatasetGraph(FacadeXOpExecutor.java:138)
    at com.github.spiceh2020.sparql.anything.engine.FacadeXOpExecutor.execute(FacadeXOpExecutor.java:170)
    at org.apache.jena.sparql.engine.main.ExecutionDispatch.visit(ExecutionDispatch.java:211)
    at org.apache.jena.sparql.algebra.op.OpService.visit(OpService.java:56)
    at org.apache.jena.sparql.engine.main.ExecutionDispatch.exec(ExecutionDispatch.java:46)
    at org.apache.jena.sparql.engine.main.OpExecutor.exec(OpExecutor.java:118)
    at org.apache.jena.sparql.engine.main.OpExecutor.execute(OpExecutor.java:89)
    at org.apache.jena.sparql.engine.main.QC.execute(QC.java:52)
    at com.github.spiceh2020.sparql.anything.engine.FacadeXOpExecutor$1.nextStage(FacadeXOpExecutor.java:210)
    at org.apache.jena.sparql.engine.iterator.QueryIterRepeatApply.makeNextStage(QueryIterRepeatApply.java:108)
    at org.apache.jena.sparql.engine.iterator.QueryIterRepeatApply.hasNextBinding(QueryIterRepeatApply.java:65)
    at org.apache.jena.sparql.engine.iterator.QueryIteratorBase.hasNext(QueryIteratorBase.java:114)
    at org.apache.jena.sparql.engine.iterator.QueryIteratorWrapper.hasNextBinding(QueryIteratorWrapper.java:38)
    at org.apache.jena.sparql.engine.iterator.QueryIteratorBase.hasNext(QueryIteratorBase.java:114)
    at org.apache.jena.sparql.engine.iterator.QueryIteratorWrapper.hasNextBinding(QueryIteratorWrapper.java:38)
    at org.apache.jena.sparql.engine.iterator.QueryIteratorBase.hasNext(QueryIteratorBase.java:114)
    at org.apache.jena.atlas.iterator.Iter$2.hasNext(Iter.java:347)
    at org.apache.jena.ext.com.google.common.collect.Iterators$ConcatenatedIterator.getTopMetaIterator(Iterators.java:1312)
    at org.apache.jena.ext.com.google.common.collect.Iterators$ConcatenatedIterator.hasNext(Iterators.java:1328)
    at org.apache.jena.sparql.engine.QueryExecutionBase.execConstruct(QueryExecutionBase.java:219)
    at org.apache.jena.sparql.engine.QueryExecutionBase.execConstruct(QueryExecutionBase.java:207)
    at com.github.spiceh2020.sparql.anything.cli.SPARQLAnything.executeQuery(SPARQLAnything.java:131)
    at com.github.spiceh2020.sparql.anything.cli.SPARQLAnything.main(SPARQLAnything.java:542)
Caused by: java.net.URISyntaxException: Illegal character in fragment at index 29: http://www.w3.org/1999/xhtml#"
    at java.base/java.net.URI$Parser.fail(URI.java:2938)
    at java.base/java.net.URI$Parser.checkChars(URI.java:3109)
    at java.base/java.net.URI$Parser.parse(URI.java:3153)
    at java.base/java.net.URI.<init>(URI.java:623)
    at com.github.spiceh2020.sparql.anything.html.HTMLTriplifier.populate(HTMLTriplifier.java:139)
    at com.github.spiceh2020.sparql.anything.html.HTMLTriplifier.populate(HTMLTriplifier.java:151)
    at com.github.spiceh2020.sparql.anything.html.HTMLTriplifier.populate(HTMLTriplifier.java:151)
    at com.github.spiceh2020.sparql.anything.html.HTMLTriplifier.populate(HTMLTriplifier.java:151)
    at com.github.spiceh2020.sparql.anything.html.HTMLTriplifier.populate(HTMLTriplifier.java:151)
    at com.github.spiceh2020.sparql.anything.html.HTMLTriplifier.populate(HTMLTriplifier.java:151)
    at com.github.spiceh2020.sparql.anything.html.HTMLTriplifier.populate(HTMLTriplifier.java:151)
    at com.github.spiceh2020.sparql.anything.html.HTMLTriplifier.populate(HTMLTriplifier.java:151)
    at com.github.spiceh2020.sparql.anything.html.HTMLTriplifier.triplify(HTMLTriplifier.java:106)
    ... 24 more
enridaga commented 3 years ago

Maybe we should implement a strategy so that the resulting IRI is first evaluated and then if an exception occurs, the string is URL-encoded in some way. However, this should already be happening in the Triplifier, so the problem may be limited to the HTML Triplifier using his own URI building code.

luigi-asprino commented 1 year ago

Maybe this is not still the case? @enridaga, do you remember the webpage raising the error?