SDM-TIB / SDM-RDFizer

An Efficient RML-Compliant Engine for Knowledge Graph Construction
https://doi.org/10.5281/zenodo.3872103
Apache License 2.0
107 stars 25 forks source link

Not defining a rr:class leads to artifacts in the output #106

Closed SHA-T closed 11 months ago

SHA-T commented 11 months ago

Hi,

Description If you don't define a rr:class (which should be optional) within the rr:subjectMap in the mapping file, you get artifacts in the output file as shown below. I assume the reason for this is that SDM-RDFizer for each subject still reserves the first line in the output for the triple <subject> a <class>; without printing the triple, but the ;, while it shouldn't print the line at all.

Reproduce Steps to reproduce the behavior using the provided default example:

  1. Remove the rr:class definition in line 20 in ./example/mapping.ttl
  2. Run SDM-RDFizer
  3. Output:
    
    @prefix rr: <http://www.w3.org/ns/r2rml#> .
    @prefix foaf: <http://xmlns.com/foaf/0.1/> .
    @prefix ex: <http://example.com/> .
    @prefix xsd: <http://www.w3.org/2001/XMLSchema#> .
    @prefix rml: <http://semweb.mmlab.be/ns/rml#> .
    @prefix ql: <http://semweb.mmlab.be/ns/ql#> .
    @prefix iasis: <http://project-iasis.eu/vocab/> .
    @base <http://project-iasis.eu/> .

; iasis:isRelatedTo http://project-iasis.eu/Chromosome/chr1.

; iasis:isRelatedTo http://project-iasis.eu/Chromosome/chr1.

; iasis:isRelatedTo http://project-iasis.eu/Chromosome/chr1.

; iasis:isRelatedTo http://project-iasis.eu/Chromosome/chr1.

; iasis:isRelatedTo http://project-iasis.eu/Chromosome/chr1.

; ; ; http://project-iasis.eu/Chromosome/chr1 a iasis:chr.


**Expected Output**

@prefix rr: http://www.w3.org/ns/r2rml# . @prefix foaf: http://xmlns.com/foaf/0.1/ . @prefix ex: http://example.com/ . @prefix xsd: http://www.w3.org/2001/XMLSchema# . @prefix rml: http://semweb.mmlab.be/ns/rml# . @prefix ql: http://semweb.mmlab.be/ns/ql# . @prefix iasis: http://project-iasis.eu/vocab/ . @base http://project-iasis.eu/ .

http://project-iasis.eu/BioType/processed_transcript iasis:isRelatedTo http://project-iasis.eu/Chromosome/chr1.

http://project-iasis.eu/BioType/transcribed_unprocessed_pseudogene iasis:isRelatedTo http://project-iasis.eu/Chromosome/chr1.

http://project-iasis.eu/BioType/unprocessed_pseudogene iasis:isRelatedTo http://project-iasis.eu/Chromosome/chr1.

http://project-iasis.eu/BioType/miRNA iasis:isRelatedTo http://project-iasis.eu/Chromosome/chr1.

http://project-iasis.eu/BioType/lincRNA iasis:isRelatedTo http://project-iasis.eu/Chromosome/chr1.

http://project-iasis.eu/Chromosome/chr1 a iasis:chr.



**OS**
I've encountered this bug on both Arch Linux 6.4.12-arch1-1 and Windows 11.
eiglesias34 commented 11 months ago

Dear @SHA-T,

First of all, thank you for using the SDM-RDFizer. I was able to find the problem and fix it. Please test it out and tell me if the problem is solved on your side so that we can close this issue.

Sincerely, Enrique Iglesias

SHA-T commented 11 months ago

Hi Enrique,

I did a fresh clone, but the bug is still there. The output is still the same.

Best regards Telman

eiglesias34 commented 11 months ago

Hello again,

That's strange. I'll check what happened.

eiglesias34 commented 11 months ago

Hello again,

There was a weird conflict with what I uploaded with the previous version. Everything should be fine now. I ran it again, and I got the correct result.

@prefix rr: <http://www.w3.org/ns/r2rml#> .
@prefix foaf: <http://xmlns.com/foaf/0.1/> .
@prefix ex: <http://example.com/> .
@prefix xsd: <http://www.w3.org/2001/XMLSchema#> .
@prefix rml: <http://semweb.mmlab.be/ns/rml#> .
@prefix ql: <http://semweb.mmlab.be/ns/ql#> .
@prefix iasis: <http://project-iasis.eu/vocab/> .
@base <http://project-iasis.eu/> .

<http://project-iasis.eu/BioType/processed_transcript> iasis:isRelatedTo <http://project-iasis.eu/Chromosome/chr1>.

<http://project-iasis.eu/BioType/transcribed_unprocessed_pseudogene> iasis:isRelatedTo <http://project-iasis.eu/Chromosome/chr1>.

<http://project-iasis.eu/BioType/unprocessed_pseudogene> iasis:isRelatedTo <http://project-iasis.eu/Chromosome/chr1>.

<http://project-iasis.eu/BioType/miRNA> iasis:isRelatedTo <http://project-iasis.eu/Chromosome/chr1>.

<http://project-iasis.eu/BioType/lincRNA> iasis:isRelatedTo <http://project-iasis.eu/Chromosome/chr1>.

<http://project-iasis.eu/Chromosome/chr1> a iasis:chr.

RIght now I am making a new release of the library.

eiglesias34 commented 11 months ago

New release done

SHA-T commented 11 months ago

Thank you for the fix! That problem is solved now, but now a subject is repeated after after a ; when it's used in more than two consecutive triples:

<https://www.ncbi.nlm.nih.gov/gene/0009796> rdf:type <https://www.wikidata.org/wiki/Q8054>;                                      # correct
        rdfs:label "PHYHIP";                                                                                             # correct
<https://www.ncbi.nlm.nih.gov/gene/0009796> <https://www.wikidata.org/wiki/Q896177> <https://www.ncbi.nlm.nih.gov/gene/0056992>. # subject repeated

<https://www.ncbi.nlm.nih.gov/gene/0007918> rdf:type <https://www.wikidata.org/wiki/Q8054>;                                      # correct
        rdfs:label "GPANK1";                                                                                             # correct
<https://www.ncbi.nlm.nih.gov/gene/0007918> <https://www.wikidata.org/wiki/Q896177> <https://www.ncbi.nlm.nih.gov/gene/0009240>. # subject repeated

<https://www.ncbi.nlm.nih.gov/gene/0008233> rdf:type <https://www.wikidata.org/wiki/Q8054>;                                      # correct
        rdfs:label "ZRSR2";                                                                                              # correct
<https://www.ncbi.nlm.nih.gov/gene/0008233> <https://www.wikidata.org/wiki/Q896177> <https://www.ncbi.nlm.nih.gov/gene/0023548>. # subject repeated

In the comments on the right I marked the triples where the subject was repeated.

eiglesias34 commented 11 months ago

Hello again @SHA-T, I updated the SDM-RDFizer. I also made sure that other cases were covered as well. Test it out so that we can be sure that everything is good on your side.

Sincerely, Enrique

SHA-T commented 11 months ago

The problem with repeated subjects after a ; is gone. But now I get missing subjects after a ..

<https://www.ncbi.nlm.nih.gov/gene/0001280> rdf:type <https://www.wikidata.org/wiki/Q8054>;
        rdfs:label "COL2A1";
        <https://www.wikidata.org/wiki/Q896177> <https://www.ncbi.nlm.nih.gov/gene/0064714>.

        <https://www.wikidata.org/wiki/Q896177> <https://www.ncbi.nlm.nih.gov/gene/0246176>.

<https://www.ncbi.nlm.nih.gov/gene/0001508> rdf:type <https://www.wikidata.org/wiki/Q8054>;
        rdfs:label "CTSB";
        <https://www.wikidata.org/wiki/Q896177> <https://www.ncbi.nlm.nih.gov/gene/0006794>.

        <https://www.wikidata.org/wiki/Q896177> <https://www.ncbi.nlm.nih.gov/gene/0006599>.

        <https://www.wikidata.org/wiki/Q896177> <https://www.ncbi.nlm.nih.gov/gene/0203068>.

        <https://www.wikidata.org/wiki/Q896177> <https://www.ncbi.nlm.nih.gov/gene/0004088>.
eiglesias34 commented 11 months ago

Ok, almost there. Can you send me a sample of what you are transforming? just to have a better idea of what I'm missing.

SHA-T commented 11 months ago

Sure, I'm constructing PrimeKG. You can reconstruct what I'm doing by cloning this repo and following these 3 steps: https://github.com/SHA-T/create_primeKG#how-to-use (It will clone SDM-RDFizer within the directory, preprocess the 'kg.csv' file that you downloaded in step 1 and use the mapping.tll file for transformation)

eiglesias34 commented 11 months ago

Hell @SHA-T,

I ran it and found the problem. Everything is good now. Please test it out. To be clear, turtle is not the only output format. There is also "n-triples", which generate everything as triples.

Sincerely, Enrique

mevs commented 11 months ago

Dear TelmanMany thanks for contacting us and your interest in our SDM-RDFizer. We wonder if you would be interested in linking the PrimeKG to DBpedia, Wikidata or UMLS. If so, we could assist you and share with you our tools for named entity recognition and linking.Best regards,Prof. Dr. Maria-Esther VidalSent from my iPhoneOn 26. Sep 2023, at 15:28, Telman SHA @.***> wrote: Sure, I'm constructing PrimeKG. You can reconstruct what I'm doing by cloning this repo and following these 3 steps: https://github.com/SHA-T/create_primeKG#how-to-use (It will clone SDM-RDFizer within the directory, preprocess the 'kg.csv' file that you downloaded in step 1 and use the mapping.tll file for transformation)

—Reply to this email directly, view it on GitHub, or unsubscribe.You are receiving this because you are subscribed to this thread.Message ID: @.***>

SHA-T commented 11 months ago

Hello Prof. Vidal,

I'm creating PrimeKG on behalf of Can. So yes, I would gladly take a shot with those tools.

@eiglesias34 Thank you for the fixes! I haven't tried it lately, but Im confident since it works on your side. So, feel free to mark this as solved.

Best regards Telman

EDIT: I've tried it now since your latest update and it looks good. Thanks, again!

eiglesias34 commented 11 months ago

I'm going to close the issue. Thank you for using the SDM-RDFizer.