SPARQL-Anything / sparql.anything

SPARQL Anything is a system for Semantic Web re-engineering that allows users to ... query anything with SPARQL.
https://sparql-anything.cc/
Apache License 2.0
212 stars 11 forks source link

Is possible to generate BlankNodes from data references? #271

Open dachafra opened 2 years ago

dachafra commented 2 years ago

The behavior should be similar to the one in RML:

@prefix rr: <http://www.w3.org/ns/r2rml#> .
@prefix rml: <http://semweb.mmlab.be/ns/rml#> .
@prefix rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#> .
@prefix ql: <http://semweb.mmlab.be/ns/ql#> .
@prefix ex: <http://example/> .
@prefix : <http://example.org/> .
@base <http://example.org/> .

:firstTM a rr:TriplesMap ;
    rml:logicalSource [
        rml:source "data.csv";
        rml:referenceFormulation ql:CSV
    ];
    rml:subjectMap [
        rml:reference "c1" ;
        rr:termType rr:BlankNode
    ];
    rr:predicateObjectMap [
        rr:predicate ex:p ;
        rml:objectMap [
            rr:template "http://example/{c2}"
        ]
    ] .

Input

c1,c2
b0,A

Output:

 _:b0 ex:p ex:A
enridaga commented 2 years ago

You can just construct bnodes:

PREFIX ex: <http://example/> 
PREFIX fx:  <http://sparql.xyz/facade-x/ns/>
PREFIX xyz: <http://sparql.xyz/facade-x/data/>

CONSTRUCT {
 [] ex:p ?A
} WHERE {
 SERVICE <x-sparql-anything:> {
    fx:properties fx:location "./data.csv" ; fx:csv.headers true .
    [] xyz:c2 ?A
 }
}

or, if you want to control the bnode identifier for some reason:

PREFIX ex: <http://example/> 
PREFIX fx:  <http://sparql.xyz/facade-x/ns/>
PREFIX xyz: <http://sparql.xyz/facade-x/data/>

CONSTRUCT {
 ?bnode ex:p ?A
} WHERE {
 SERVICE <x-sparql-anything:> {
    fx:properties fx:location "./data.csv" ; fx:csv.headers true .
    [] xyz:c1 ?b0 ; xyz:c2 ?A
 }
 BIND ( BNODE ( ?b0 ) as ?bnode ) 
}
dachafra commented 2 years ago

I've arrived at this point, yes, but you can not take the identifier of the BN from the input source, right?

enridaga commented 2 years ago

I've arrived at this point, yes, but you can not take the identifier of the BN from the input source, right?

You can take it from there, as you see in the second query. I am not sure I get the use case here. Do you mean that you want to keep blank node identifier in the generated graph? The generated blank node ids depend on the serialiser. BNode identifiers are supposed to be local and are usually generated during serialisation or during data loading. So, what's the point of forcing them? If you want to mint an identifier, you probably want an IRI instead. Am I getting it right?

justin2004 commented 2 years ago

you could do this:

curl --silent 'http://localhost:3000/sparql.anything'  \
--header "Accept: text/csv" \
--data-urlencode 'query=
PREFIX  fx:   <http://sparql.xyz/facade-x/ns/>
SELECT  *
WHERE
  { SERVICE <x-sparql-anything:>
      { fx:properties
                  fx:location     "/app/input.csv" ;
                  fx:csv.headers  true .
        ?s        ?p              ?o
        BIND(iri(?s) AS ?s_iri)
      }
  }
'

yielding:

s p o s_iri
_:b0 http://sparql\.xyz/facade\-x/data/c1 b0 _:file:/app/input.csv##row1
_:b0 http://sparql\.xyz/facade\-x/data/c2 A _:file:/app/input.csv##row1
_:b1 http://www\.w3\.org/1999/02/22\-rdf\-syntax\-ns\#type http://sparql\.xyz/facade\-x/ns/root _:file:/app/input.csv#
_:b1 http://www\.w3\.org/1999/02/22\-rdf\-syntax\-ns\#\_1 _:b0 _:file:/app/input.csv#
justin2004 commented 2 years ago

oh, i know what you want now. one minute.

justin2004 commented 2 years ago

it appears that apache jena does not let you synthesize a bnode identifier manually. this is as close as i can get but neither quad is what you are looking for (one isn't a well formed quad and i'm not sure about the other). though i think an actual IRI is what i would use in practice.

curl --silent 'http://localhost:3000/sparql.anything'  \
--header "Accept: application/n-quads" \
--data-urlencode 'query=
PREFIX  :     <http://example.com/>
PREFIX  xyz:  <http://sparql.xyz/facade-x/data/>
PREFIX  fx:   <http://sparql.xyz/facade-x/ns/>
CONSTRUCT 
  { 
    ?new_s_iri :p ?new_c2 .
    ?new_s_str :p ?new_c2 .
  }
WHERE
  { SERVICE <x-sparql-anything:>
      { fx:properties
                  fx:location     "/app/input.csv" ;
                  fx:csv.headers  true .
        ?s        xyz:c1          ?c1 ;
                  xyz:c2          ?c2
        BIND(iri(concat("_:", ?c1)) AS ?new_s_iri)
        BIND(concat("_:", ?c1) AS ?new_s_str)
        BIND(iri(concat(str(:), ?c2)) AS ?new_c2)
      }
  }
'

yields:

"_:b0" <http://example.com/p> <http://example.com/A> .
<_:b0> <http://example.com/p> <http://example.com/A> .
dachafra commented 2 years ago

@justin2004 yeah, exactly! I was able to obtain the same results, but I don't think that any of the results are valid RDF, right?

For letting you know, this is coming from this R2RML test-cases: https://www.w3.org/2001/sw/rdb2rdf/test-cases/#R2RMLTC0002b. It is not that I specifically want to have this feature in the engine but it is more for comparing both solutions. One of the main benefits of having this feature is that identifiers do not have to be maintained in memory during the execution.

enridaga commented 2 years ago

I don't think it is possible to control the blank nodes that are generated by the serializer, but this is probably a question for users@jena.apache.org.

However, while playing with this use case I found an interesting issue when one wants to generate multiple triples with the same bnode on different construct template projections. At the moment, a new bnode is generated for every projection, even if we use the BNODE function. This is reproducible by adding more rows to the example CSV. A new bnode is created for each one of them. I will open a separate issue for that.

justin2004 commented 2 years ago

At the moment, a new bnode is generated for every projection, even if we use the BNODE function.

I thought I just wasn't understanding how to use bnode() with an argument but since you might have also expected different behavior I opened an issue: https://issues.apache.org/jira/browse/JENA-2340

enridaga commented 2 years ago

For letting you know, this is coming from this R2RML test-cases: https://www.w3.org/2001/sw/rdb2rdf/test-cases/#R2RMLTC0002b. It is not that I specifically want to have this feature in the engine but it is more for comparing both solutions.

Considering they are bnodes, the comparison can be done via graph isomorphism (there are some useful utils for this in Jena).