SDM-TIB / SDM-RDFizer

An Efficient RML-Compliant Engine for Knowledge Graph Construction
https://doi.org/10.5281/zenodo.3872103
Apache License 2.0
107 stars 25 forks source link

Cannot Create RDF from two Constants #113

Closed SemanticSculptor closed 3 months ago

SemanticSculptor commented 3 months ago

Describe the bug If I make an RDF that has a constant for the Subject and Object, no errors happen but no triples are produced.

To Reproduce @prefix rdf: http://www.w3.org/1999/02/22-rdf-syntax-ns# . @prefix rr: http://www.w3.org/ns/r2rml# . @prefix rml: http://semweb.mmlab.be/ns/rml# .

<#Table123IBEMapping> a rr:TriplesMap; rml:logicalSource [ rml:source "Table123" ; rml:referenceFormulation ql:CSV ] ; rr:subjectMap [ rr:constant "http://www.ontologyrepository.com/CommonCoreOntologies/DataTable/Table123" ; rr:class cco:DataTable ; rr:class owl:NamedIndividual ];

Table123IBEToTable123ICEMap

rr:predicateObjectMap [ rr:predicate "http://purl.obolibrary.org/obo/BFO_0000050" ; rr:objectMap [ rr:constant "https://example.com/ontology/foo_00000017/Table123" ; ] ] .

Expected output http://www.ontologyrepository.com/CommonCoreOntologies/DataTable/Table123 http://purl.obolibrary.org/obo/BFO_0000050 https://example.com/ontology/foo_00000017/Table123

eiglesias34 commented 3 months ago

Hello @SemanticSculptor,

Thank you for using the SDM-RDFizer. I found and fixed the issue. Please test it on your end so that we can close this issue.

Sincerely, Enrique Iglesias

lindsayjgc commented 3 months ago

Hey I'm working with @SemanticSculptor and am running into an error:

TypeError: SubjectMap.__init__() takes from 4 to 7 positional arguments but 10 were given

Here is my updated R2RML (added some namespaces and fixed their syntax):

@prefix rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#> .
@prefix rr: <http://www.w3.org/ns/r2rml#> .
@prefix rml: <http://semweb.mmlab.be/ns/rml#> .
@prefix ql: <http://semweb.mmlab.be/ns/ql#> .
@prefix owl: <http://www.w3.org/2002/07/owl#> .
@prefix cco: <http://example.com/cco#> .

<#Table123IBEMapping> a rr:TriplesMap;
rml:logicalSource [
rml:source "DIP.csv" ;
rml:referenceFormulation ql:CSV
] ;
rr:subjectMap [
rr:constant "http://www.ontologyrepository.com/CommonCoreOntologies/DataTable/Table123" ;
rr:class cco:DataTable ;
rr:class owl:NamedIndividual
];
#Table123IBEToTable123ICEMap
rr:predicateObjectMap [
rr:predicate "http://purl.obolibrary.org/obo/BFO_0000050" ;
rr:objectMap [
rr:constant "https://example.com/ontology/foo_00000017/Table123" ;
]
] .

If I remove the logical source, it will run without an error but produces no triples. Is this the case for you? Am I using the tool correctly?

In the old version, if I used rr:template instead of rr:constant with a logical source, it runs but produces duplicate triples for each record as expected with a template.

lindsayjgc commented 3 months ago

Hi @eiglesias34,

I just realized that I was using version 4.7.3.5 previously. When I got the two latest versions before your fix (4.7.4.1 and 4.7.4) and tried to run my R2RML that worked on version 4.7.3.5 it was giving me the same error.

TypeError: SubjectMap.__init__() takes from 4 to 7 positional arguments but 10 were given

eiglesias34 commented 3 months ago

Hello @lindsayjgc,

Thank you for using SDM-RDFizer. I found the subject map call with the extra parameters and fixed it. A triples map in RML must always have a logical source defined. If it is not specified, it will be treated as a function map, and a function map doesn't generate triples but transforms data (I know it is confusing, but I don't make the rules). I ran it with your mapping after I made the change, and I got this result:

<http://www.ontologyrepository.com/CommonCoreOntologies/DataTable/Table123> <http://www.w3.org/1999/02/22-rdf-syntax-ns#type> <http://example.com/cco#DataTable>.
<http://www.ontologyrepository.com/CommonCoreOntologies/DataTable/Table123> <http://www.w3.org/1999/02/22-rdf-syntax-ns#type> <http://www.w3.org/2002/07/owl#NamedIndividual>.
<http://www.ontologyrepository.com/CommonCoreOntologies/DataTable/Table123> <http://purl.obolibrary.org/obo/BFO_0000050> <https://example.com/ontology/foo_00000017/Table123>.

From what I could tell, this is the intended output since the triples map has two classes defined and one property that only generates one triple.

Please test it out on your side to ensure that the problem is solved. SDM-RDFizer recently underwent a major update, and since a couple of new modules were added, I may not have uploaded everything which might explain the current issue. Hopefully, that is not the case.

Thank you again for using SDM-RDFizer. Sincerely, Enrique Iglesias

lindsayjgc commented 3 months ago

Hi @eiglesias34,

Thank you so much for helping us so quickly. The code now runs without error and produces triples, but it appears that rr:constant is working like rr:template and the result has far too many triples. If I remove duplicates, I do get the result you received, which is indeed the intended output.

eiglesias34 commented 3 months ago

Hello again,

That is good to hear. Regarding the generation of extra triples, when there are two or more classes, the parser sees them as two different triples maps instead of just one. I'm trying to solve this problem.

I will close this issue since the problem has been solved. If you need to contact me again, please reopen it.

Sincerely, Enrique Iglesias

lindsayjgc commented 3 months ago

Hey @eiglesias34,

As I am not a collaborator, I cannot reopen an issue that a collaborator has closed.

I don't believe the issue is the multiple classes, I think the issue is that it is creating the triples for each row in the table - so I have 50,000 records and so it is creating 50,000 duplicates (which is why I didn't want to have a data source, but I understand why it is necessary).

eiglesias34 commented 3 months ago

Hello again,

That makes more sense. Since SDM-RDFizer will go through the data source regardless of the subject map's type. Unfortunately, this is a particular triples map, and I can't assume that all triples maps with a subject as a constant will generate a handful of triples and ignore the rest of the data source.

I'll reopen the issue for the time being. I'll rebrand it as a "question" instead of a "bug."

Feel free to ask any more questions as needed.

Sincerely, Enrique Iglesias

lindsayjgc commented 3 months ago

Thank you so much for fixing the bug! I guess our solution is to keep remove_duplicates on.

SemanticSculptor commented 3 months ago

THANK YOU so much for your help! We really appreciate your responsiveness and commitment to quality.

eiglesias34 commented 3 months ago

Hello,

I noticed something: you can use a smaller data source for the triples maps that only have constants. This will remove the need to go through thousands of rows and remove duplicates (at least for those triples maps).

Since @lindsayjgc opened a new issue, I will close this one. Any future communication can be done through there.

Thank you again for using SDM-RDFizer.

Sincerely, Enrique Iglesias

SemanticSculptor commented 3 months ago

oh, thank you for the obvious solution! We ill just use a blank.csv