Closed Edelweiss closed 6 years ago
step by step analysis to track down error…
[1] Compare papyri and DCLP (navigator, master branch, mapping for HGV)
(files are identical: git diff papyri/master...dclp/master -- pn-mapping/xslt/hgv-rdf.xsl)
git remote -v
dclp git@github.com:DCLP/navigator.git (fetch)
dclp git@github.com:DCLP/navigator.git (push)
papyri git@github.com:papyri/navigator.git (fetch)
papyri git@github.com:papyri/navigator.git (push)
[2] Run test scenario on HGV file 4760 (idp.data, master branch, HGV file 4760.xml)
(files are identical: git diff papyri/master...dclp/master -- HGV_meta_EpiDoc/HGV5/4760.xml)
git remote -v
dclp git@github.com:DCLP/idp.data.git (fetch)
dclp git@github.com:DCLP/idp.data.git (push)
papyri git@github.com:papyri/idp.data.git (fetch)
papyri git@github.com:papyri/idp.data.git (push)
[3] Compare output using vimdiff (turn indentation on for test run, omit list of namespaces in output)
<rdf:Description rdf:about="http://papyri.info/hgv/4760/source">
<dct:identifier>papyri.info/hgv/4760</dct:identifier>
<dct:identifier>tm:4760</dct:identifier>
<dct:identifier>
<rdf:Description rdf:about="http://papyri.info/hgv/BGU_7_1510">
<dct:identifier rdf:resource="http://papyri.info/hgv/4760/source"/>
</rdf:Description>
</dct:identifier>
<dct:isPartOf>
<rdf:Description rdf:about="http://papyri.info/hgv/BGU_7">
<dct:bibliographicCitation>BGU 7</dct:bibliographicCitation>
<rdf:type rdf:resource="http://purl.org/ontology/bibo/Book"/>
<dct:isPartOf>
<rdf:Description rdf:about="http://papyri.info/hgv/BGU">
<rdf:type rdf:resource="http://purl.org/ontology/bibo/Series"/>
<dct:bibliographicCitation>BGU</dct:bibliographicCitation>
<dct:isPartOf rdf:resource="http://papyri.info/hgv"/>
</rdf:Description>
</dct:isPartOf>
</rdf:Description>
</dct:isPartOf>
<dct:relation rdf:resource="http://www.trismegistos.org/text/4760"/>
<dct:relation>
<rdf:Description rdf:about="http://papyri.info/trismegistos/4760">
<dct:relation rdf:resource="http://papyri.info/hgv/4760/source"/>
</rdf:Description>
</dct:relation>
<dct:source>
<rdf:Description rdf:about="http://papyri.info/hgv/4760/work">
<dct:bibliographicCitation>BGU 7, 1510</dct:bibliographicCitation>
</rdf:Description>
</dct:source>
<rdfs:label>BGU</rdfs:label>
<foaf:page>
<rdf:Description rdf:about="http://papyri.info/hgv/4760">
<foaf:topic rdf:resource="http://papyri.info/hgv/4760/source"/>
</rdf:Description>
</foaf:page>
</rdf:Description>
(files are identical: vimdiff ~/Desktop/papyri_rdf.xml ~/Desktop/dclp_rdf.xml)
[4] Compare RDFs as produced by the numbers server
→ different xpath hierarchy
blocker for #346
Hugh in an e-mail:
It looks to me like there’s no difference in content. My guess is that your more-recent version of Jena is just serializing RDF XML differently, but it’s the same RDF. The takeaway is that we shouldn’t be using XPath to parse RDF because it’s the wrong tool for the job. Honestly, I’m surprised we got away with it as long as we did. I can suggest two possible solutions:
1) (quick and dirty) Just rewrite the XPaths in lib/numbers_rdf.rb to the new format. 2) Use Ruby-RDF to extract the data instead. Rewrite numbers_rdf.rb to do the right things.
I’m inclined to do #2, and probably will. But if you need to just get it working very quickly, there’d be no harm in #1 as an interim solution.
»replaces« constraint remains unclear (last part of the xpath)
/rdf:RDF/rdf:Description[@rdf:about='http://#{identifier}/source']/dcterms:relation/
@rdf:resource[not(. =//dcterms:replaces/@rdf:resource)]
I would have expected one of the following two examples to come along with a »replaces« tag as one file replaces the other
http://papyri.info/ddbdp/bgu;1;1/rdf http://papyri.info/ddbdp/p.louvre;1;4/rdf
as defined in the reprint clause
<ref n="p.louvre;1;4" type="reprint-in">P.Louvre 1.4</ref>
Here is where the »replaces« tag is written
https://github.com/DCLP/navigator/blob/master/pn-mapping/xslt/dclp-rdf.xsl#L78
reprint definition taken from P. Louvre I 4
<body>
<head n="11853" xml:lang="en">
<date>AD -166</date>
<placeName>Soknopaiou Nesos</placeName>
<ref n="bgu;1;1|bgu;1;337|chr.wilck;;92" type="reprint-from">Chrest.Wilck. 92, BGU 1 337 (col 1 only), BGU 1 1 (col 2 only)</ref>
</head>
…
</body>
Example P. Louvre I 4, which is a reprint from BGU I 1 and various other publications (bgu;1;1|bgu;1;337|chr.wilck;;92
):
Even though the reprint information is in the EpiDoc file
https://github.com/DCLP/idp.data/blob/master/DDB_EpiDoc_XML/p.louvre/p.louvre.1/p.louvre.1.4.xml#L58
and even though the xslt obviously picks up the information
https://github.com/DCLP/navigator/blob/master/pn-mapping/xslt/dclp-rdf.xsl#L75
the reprint defintion doesn’t appear in the final RDF
http://papyri.info/ddbdp/p.louvre;1;4/rdf
In the RDF there’s no connection whatsover to BGU I 1 and the other publications.
But the »replaces« relations are there in the xml code generated by ddbdp-rdf.xsl
<dct:replaces rdf:resource="http://papyri.info/ddbdp/bgu;1;1/source"/>
<dct:replaces rdf:resource="http://papyri.info/ddbdp/bgu;1;337/source"/>
<dct:replaces rdf:resource="http://papyri.info/ddbdp/chr.wilck;;92/source"/>
I therefore consider it obsolete and will omit it
compare xpaths from different rdf files that refer to the same TM no:
xpath as expected by SoSOL’s interface to the number sever:
RDF » Description » relation » @resource