Closed mjordan closed 10 years ago
That's why that field is always blank!
I was wrong - it is not blank for the new External and Redirect datastreams.... It used to work, as evidenced by https://gist.github.com/mjordan/8250658 ....this is an early example, pre-'agent'.... so we introduced a bug somewhere along the line. None of the other examples in my gists have values so the bug is pretty old.
Which reminds me... we probably should insert a 'date generated' comment in the PREMIS XML. I'm surprised there isn't a PREMIS header like there is a METS header that contains metadata about the XML file itself.
Giving up for tonight and committing a couple other small changes to the XSL. The problem is definitely with the selecting the value of foxml:contentLocation/@REF, it's not with the variable assignment.
Ok. I'm with you on the frustration. Just spend an hour trying to figure it out, and it doesn't make any sense at all.
I can get it to print anything in this tree, except contentLocation!
<foxml:datastream ID="JPG" STATE="A" CONTROL_GROUP="M" VERSIONABLE="true">
<foxml:datastreamVersion ID="JPG.0" LABEL="Medium sized JPEG" CREATED="2013-11-08T12:49:38.851Z" MIMETYPE="image/jpeg" SIZE="237956">
<foxml:contentDigest TYPE="SHA-1" DIGEST="9045e6ff00de22cd33b271dfeed65df51a733a80"/>
<foxml:contentLocation TYPE="INTERNAL_ID" REF="yul:89067+JPG+JPG.0"/>
</foxml:datastreamVersion>
Here is the full foxml:
<?xml version="1.0" encoding="UTF-8"?>
<foxml:digitalObject xmlns:foxml="info:fedora/fedora-system:def/foxml#" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" VERSION="1.1" PID="yul:89067" xsi:schemaLocation="info:fedora/fedora-system:def/foxml# http://www.fedora.info/definitions/1/0/foxml1-1.xsd">
<foxml:objectProperties>
<foxml:property NAME="info:fedora/fedora-system:def/model#state" VALUE="Active"/>
<foxml:property NAME="info:fedora/fedora-system:def/model#label" VALUE=""City of Dover" : bought by Penetang group"/>
<foxml:property NAME="info:fedora/fedora-system:def/model#ownerId" VALUE="nruest"/>
<foxml:property NAME="info:fedora/fedora-system:def/model#createdDate" VALUE="2013-11-08T12:49:34.259Z"/>
<foxml:property NAME="info:fedora/fedora-system:def/view#lastModifiedDate" VALUE="2013-12-31T08:02:53.659Z"/>
</foxml:objectProperties>
<foxml:datastream ID="AUDIT" STATE="A" CONTROL_GROUP="X" VERSIONABLE="false">
<foxml:datastreamVersion ID="AUDIT.0" LABEL="Audit Trail for this object" CREATED="2013-11-08T12:49:34.259Z" MIMETYPE="text/xml" FORMAT_URI="info:fedora/fedora-system:format/xml.fedora.audit">
<foxml:xmlContent>
<audit:auditTrail xmlns:audit="info:fedora/fedora-system:def/audit#">
<audit:record ID="AUDREC1">
<audit:process type="Fedora API-M"/>
<audit:action>addDatastream</audit:action>
<audit:componentID>TECHMD_FITS</audit:componentID>
<audit:responsibility>nruest</audit:responsibility>
<audit:date>2013-11-08T12:49:38.223Z</audit:date>
<audit:justification>Copied datastream from yul:89067.</audit:justification>
</audit:record>
<audit:record ID="AUDREC2">
<audit:process type="Fedora API-M"/>
<audit:action>addDatastream</audit:action>
<audit:componentID>TN</audit:componentID>
<audit:responsibility>nruest</audit:responsibility>
<audit:date>2013-11-08T12:49:38.531Z</audit:date>
<audit:justification>Copied datastream from yul:89067.</audit:justification>
</audit:record>
<audit:record ID="AUDREC3">
<audit:process type="Fedora API-M"/>
<audit:action>addDatastream</audit:action>
<audit:componentID>JPG</audit:componentID>
<audit:responsibility>nruest</audit:responsibility>
<audit:date>2013-11-08T12:49:38.851Z</audit:date>
<audit:justification>Copied datastream from yul:89067.</audit:justification>
</audit:record>
<audit:record ID="AUDREC4">
<audit:process type="Fedora API-M"/>
<audit:action>addDatastream</audit:action>
<audit:componentID>JP2</audit:componentID>
<audit:responsibility>nruest</audit:responsibility>
<audit:date>2013-11-08T12:49:39.306Z</audit:date>
<audit:justification>Copied datastream from yul:89067.</audit:justification>
</audit:record>
<audit:record ID="AUDREC5">
<audit:process type="Fedora API-M"/>
<audit:action>modifyObject</audit:action>
<audit:componentID/>
<audit:responsibility>anonymous</audit:responsibility>
<audit:date>2013-12-31T08:02:52.959Z</audit:date>
<audit:justification>PREMIS:eventType=fixity check; PREMIS:file=yul:89067+MODS+MODS.0; PREMIS:eventOutcome=SHA-1 checksum validated.
</audit:justification>
</audit:record>
<audit:record ID="AUDREC6">
<audit:process type="Fedora API-M"/>
<audit:action>modifyObject</audit:action>
<audit:componentID/>
<audit:responsibility>anonymous</audit:responsibility>
<audit:date>2013-12-31T08:02:53.058Z</audit:date>
<audit:justification>PREMIS:eventType=fixity check; PREMIS:file=yul:89067+DC+DC.0; PREMIS:eventOutcome=SHA-1 checksum validated.
</audit:justification>
</audit:record>
<audit:record ID="AUDREC7">
<audit:process type="Fedora API-M"/>
<audit:action>modifyObject</audit:action>
<audit:componentID/>
<audit:responsibility>anonymous</audit:responsibility>
<audit:date>2013-12-31T08:02:53.659Z</audit:date>
<audit:justification>PREMIS:eventType=fixity check; PREMIS:file=yul:89067+OBJ+OBJ.0; PREMIS:eventOutcome=SHA-1 checksum validated.
</audit:justification>
</audit:record>
</audit:auditTrail>
</foxml:xmlContent>
</foxml:datastreamVersion>
</foxml:datastream>
<foxml:datastream ID="RELS-EXT" STATE="A" CONTROL_GROUP="X" VERSIONABLE="true">
<foxml:datastreamVersion ID="RELS-EXT.0" LABEL="Fedora Object to Object Relationship Metadata." CREATED="2013-11-08T12:49:34.259Z" MIMETYPE="application/rdf+xml" FORMAT_URI="info:fedora/fedora-system:FedoraRELSExt-1.0" SIZE="544">
<foxml:contentDigest TYPE="SHA-1" DIGEST="8acd007d964a3bf29e44d0978c1369051a6abbd1"/>
<foxml:xmlContent>
<rdf:RDF xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:fedora="info:fedora/fedora-system:def/relations-external#" xmlns:fedora-model="info:fedora/fedora-system:def/model#" xmlns:islandora="http://islandora.ca/ontology/relsext#">
<rdf:Description rdf:about="info:fedora/yul:89067">
<fedora:isMemberOfCollection rdf:resource="info:fedora/yul:F0433"/>
<fedora-model:hasModel rdf:resource="info:fedora/islandora:sp_large_image_cmodel"/>
</rdf:Description>
</rdf:RDF>
</foxml:xmlContent>
</foxml:datastreamVersion>
</foxml:datastream>
<foxml:datastream ID="MODS" STATE="A" CONTROL_GROUP="X" VERSIONABLE="true">
<foxml:datastreamVersion ID="MODS.0" LABEL="MODS Record" CREATED="2013-11-08T12:49:34.259Z" MIMETYPE="text/xml" SIZE="2387">
<foxml:contentDigest TYPE="SHA-1" DIGEST="a94e53eb3f379cdd43594ae55047583400a08bbb"/>
<foxml:xmlContent>
<mods xmlns="http://www.loc.gov/mods/v3" xmlns:mods="http://www.loc.gov/mods/v3" xmlns:xlink="http://www.w3.org/1999/xlink" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance">
<identifier type="local">ASC02928</identifier>
<identifier type="hdl">http://hdl.handle.net/10315/1849</identifier>
<location>
<physicalLocation>1974-002 / 192 (858)</physicalLocation>
</location>
<titleInfo>
<title>"City of Dover" : bought by Penetang group</title>
</titleInfo>
<abstract>Image of small ship at dock in ice; large ship is at dock in background; probably Master Feeds in distance</abstract>
<targetAudience>ASC Red Dot</targetAudience>
<name>
<namePart>Toronto Telegram</namePart>
<role>
<roleTerm authority="marcrelator" type="text">Publisher</roleTerm>
</role>
</name>
<originInfo>
<dateCreated>2000/03/31</dateCreated>
<dateIssued>1949/04/05</dateIssued>
<publisher>Toronto Telegram</publisher>
<place>
<placeTerm type="text">Canada</placeTerm>
</place>
<place>
<placeTerm type="text">Toronto</placeTerm>
</place>
</originInfo>
<typeOfResource>still image</typeOfResource>
<genre authority="lctgm">Documentary Photography</genre>
<language>
<languageTerm authority="iso639-2b" type="code">eng</languageTerm>
</language>
<physicalDescription>
<form>nonprojected graphic</form>
<extent>1 photograph : b&amp;w negative ; 10 x 13 cm</extent>
</physicalDescription>
<note>Box 1 CD 1B</note>
<relatedItem>
<titleInfo>
<title>Toronto Telegram fonds, F0433</title>
</titleInfo>
<location>
<location>
<url note="Finding Aid">http://archivesfa.library.yorku.ca/fonds/ON00370-f0000433.htm</url>
</location>
</location>
</relatedItem>
<subject>
<topic>Toronto Telegram</topic>
</subject>
<accessCondition type="useAndReproduction">For further copyright information contact : ascproj@yorku.ca</accessCondition>
</mods>
</foxml:xmlContent>
</foxml:datastreamVersion>
</foxml:datastream>
<foxml:datastream ID="DC" STATE="A" CONTROL_GROUP="X" VERSIONABLE="true">
<foxml:datastreamVersion ID="DC.0" LABEL="DC Record" CREATED="2013-11-08T12:49:34.259Z" MIMETYPE="text/xml" SIZE="1276">
<foxml:contentDigest TYPE="SHA-1" DIGEST="ad931b32519134be6074e22da8f332f79f268585"/>
<foxml:xmlContent>
<oai_dc:dc xmlns:oai_dc="http://www.openarchives.org/OAI/2.0/oai_dc/" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:schemaLocation="http://www.openarchives.org/OAI/2.0/oai_dc/ http://www.openarchives.org/OAI/2.0/oai_dc.xsd">
<dc:title>"City of Dover" : bought by Penetang group</dc:title>
<dc:subject>Toronto Telegram</dc:subject>
<dc:description>Image of small ship at dock in ice; large ship is at dock in background; probably Master Feeds in distance</dc:description>
<dc:description>Box 1 CD 1B</dc:description>
<dc:publisher>Toronto Telegram</dc:publisher>
<dc:contributor>Toronto Telegram (Publisher)</dc:contributor>
<dc:type>StillImage</dc:type>
<dc:type>Documentary Photography</dc:type>
<dc:format>1 photograph : b&amp;w negative ; 10 x 13 cm</dc:format>
<dc:format>nonprojected graphic</dc:format>
<dc:identifier>yul:89067</dc:identifier>
<dc:identifier>ASC02928</dc:identifier>
<dc:identifier>http://hdl.handle.net/10315/1849</dc:identifier>
<dc:identifier/>
<dc:language>eng</dc:language>
<dc:relation>Toronto Telegram fonds, F0433</dc:relation>
<dc:rights>For further copyright information contact : ascproj@yorku.ca</dc:rights>
</oai_dc:dc>
</foxml:xmlContent>
</foxml:datastreamVersion>
</foxml:datastream>
<foxml:datastream ID="OBJ" STATE="A" CONTROL_GROUP="M" VERSIONABLE="true">
<foxml:datastreamVersion ID="OBJ.0" LABEL="OBJ Datastream" CREATED="2013-11-08T12:49:34.259Z" MIMETYPE="image/tiff" SIZE="16129943">
<foxml:contentDigest TYPE="SHA-1" DIGEST="ca62dacfbe4ec9ea85c140295102cefabbb72b4d"/>
<foxml:contentLocation TYPE="INTERNAL_ID" REF="yul:89067+OBJ+OBJ.0"/>
</foxml:datastreamVersion>
</foxml:datastream>
<foxml:datastream ID="TECHMD_FITS" STATE="A" CONTROL_GROUP="M" VERSIONABLE="true">
<foxml:datastreamVersion ID="TECHMD_FITS.0" LABEL="TECHMD_FITS" CREATED="2013-11-08T12:49:38.223Z" MIMETYPE="text/xml" SIZE="8659">
<foxml:contentDigest TYPE="SHA-1" DIGEST="9f777fbf664f54ec0e8df96a7e37a3179a96d9be"/>
<foxml:contentLocation TYPE="INTERNAL_ID" REF="yul:89067+TECHMD_FITS+TECHMD_FITS.0"/>
</foxml:datastreamVersion>
</foxml:datastream>
<foxml:datastream ID="TN" STATE="A" CONTROL_GROUP="M" VERSIONABLE="true">
<foxml:datastreamVersion ID="TN.0" LABEL="Thumbnail" CREATED="2013-11-08T12:49:38.531Z" MIMETYPE="image/jpeg" SIZE="31133">
<foxml:contentDigest TYPE="SHA-1" DIGEST="dfdf5f8fdf9bc741a423cef6f9c7de3d925d4094"/>
<foxml:contentLocation TYPE="INTERNAL_ID" REF="yul:89067+TN+TN.0"/>
</foxml:datastreamVersion>
</foxml:datastream>
<foxml:datastream ID="JPG" STATE="A" CONTROL_GROUP="M" VERSIONABLE="true">
<foxml:datastreamVersion ID="JPG.0" LABEL="Medium sized JPEG" CREATED="2013-11-08T12:49:38.851Z" MIMETYPE="image/jpeg" SIZE="237956">
<foxml:contentDigest TYPE="SHA-1" DIGEST="9045e6ff00de22cd33b271dfeed65df51a733a80"/>
<foxml:contentLocation TYPE="INTERNAL_ID" REF="yul:89067+JPG+JPG.0"/>
</foxml:datastreamVersion>
</foxml:datastream>
<foxml:datastream ID="JP2" STATE="A" CONTROL_GROUP="M" VERSIONABLE="true">
<foxml:datastreamVersion ID="JP2.0" LABEL="JPEG 2000" CREATED="2013-11-08T12:49:39.306Z" MIMETYPE="image/jp2" SIZE="336316">
<foxml:contentDigest TYPE="SHA-1" DIGEST="3309e618a40456a72970d966a0697c2790e705ee"/>
<foxml:contentLocation TYPE="INTERNAL_ID" REF="yul:89067+JP2+JP2.0"/>
</foxml:datastreamVersion>
</foxml:datastream>
</foxml:digitalObject>
I take that back, I can't get anything from datastreamVersion to print either. Just contentDigest. Progress?
Haven't started on this one yet today. Will merge in branch mentioned in #14 first.
I pushed pretty xml. If that messes up your merge, let me know, and I'll roll back.
Yeah, please do. Just tried to merge and got conflicts. Have reset --hard so I'm back on c3ede93f6d6f237f4570d5c5096d840f0e8b2e7e.
done
Pulled but still have conflicts. Let me take a look.
This is weird - when I git pull I go back to c3ede93f6d6f237f4570d5c5096d840f0e8b2e7e and tells me I'm up to date, but when I visit https://github.com/ruebot/islandora_premis/commits/7.x it tells me that 9836700920 is the latest. Same in two different browsers so it's not a cache issue. Any idea why the discrepancy?
c3ede93 was the pretty print commit. I got rid of that in origin. HEAD should be at 9836700 now, which is you last commit, and where said not to do anything.... which I violated.
So should I revert back to 9836700920b245dabc78b9026bb650da6e13d759 in my local copy?
yeah!
git reset --hard HEAD~1
on your 7.x branch should do it.
OK, thanks, back at 9836700920b245dabc78b9026bb650da6e13d759. Let me try my merge again.
OK, success, will push if you think it's OK.
PUSH!
I'll continue hacking on my end... but pretty print first. If you're a vim user, this is killer :%!xmllint --format -
(pretty prints and validates!)
Pushed.
I, sir, am a vim user. Thanks!
nmap <Leader>xml :%!xmllint --format -<CR>
/foxml:digitalObject/foxml:datastream/foxml:datastreamVersion/foxml:contentLocation/@REF
should do it, right?
This is what I am getting here:
You can try it. But foxml:datastreamVersion is the context node, so I don't see why foxml:contentLocation/@REF won't work. Similar context-node queries work elsewhere.
BTW, saxon does give me some output using the current foxml_to_premis.xsl. Could be a stupid obscure PHP or libxslt bug. But it did work in earlier versions of the stylesheet. That's what I don't understand.
Really strange ... i poked at this a little bit. It is a puzzle. It seems to be the xpath, but the path is right.
It works if you the change the content_location variable to this:
<xsl:variable name="content_location" select="//foxml:contentLocation/@REF"/>
The variable is declared further down as well ... is it working there ?
Nick ... I run the foxml you provided previously through the xslt you linked to and it gets transformed by Oxygen and the elements get populated. See it here: https://gist.github.com/dmoses/8322788 The transformer I'm using is saxon 6.5.5.
The plot thickens...
I am getting expected values in contentLocationValue when I run the following CLI PHP script, which BTW is essentially the same code as we're running in the module. Also, foxml_to_premis.xsl is the same one that is not working for me in the module:
<?php
$xsl_doc = new DOMDocument();
$xsl_doc->load("foxml_to_premis.xsl");
$xml_doc = new DOMDocument();
$xml_doc->load("changeme_15.fox.xml");
$xslt_proc = new XSLTProcessor();
$xslt_proc->importStylesheet($xsl_doc);
$output = $xslt_proc->transformToXML($xml_doc);
print $output;
?>
When I grep the output, I get:
<contentLocationValue/>
<contentLocationValue/>
<contentLocationValue/>
<contentLocationValue/>
<contentLocationValue>changeme:15+OBJ+OBJ.0</contentLocationValue>
<contentLocationValue>changeme:15+TECHMD+TECHMD.0</contentLocationValue>
<contentLocationValue>changeme:15+TN+TN.0</contentLocationValue>
<contentLocationValue>changeme:15+MEDIUM_SIZE+MEDIUM_SIZE.0</contentLocationValue>
but the changeme:15:OBJ+OBJ.O, etc do not appear in the version generated via the module. Same stylesheet. This has to be a PHP issue.
This is a shameful hack, but what if we added XML parsing code to the islandora_premis_run_xsl_transform() function to grab the value of foxml:contentLocation/@REF and passed it into the stylesheet as a parameter? We may never figure out what is causing this truly cruel bug.
Let's try it, and see what happens!
@dmoses good to know! @edf shared a perl transformer that worked as well. So, there is definitely something super wonky here that we're not seeing.
I can try it tonight, but feel free to take a stab sooner if you have time.
Found the problem... this is frigging hilarious.
When exported (which is what we do in this module), FOXML doesn't contain any foxml:contentLocation elements for Managed datastreams. Instead, the content for those datastreams is embedded within the XML itself as base64 strings in foxml:binaryContent tags. See https://gist.github.com/mjordan/8329267 for an example of the FOXML were are applying our stylesheet to. Go to line 1081 to see the OBJ datastream file embedded in the XML.
So, we aren't getting any matches for foxml:contentLocation/@REF for Managed datastreams because there aren't any of those elements in exported FOXML that we're running through the stylesheet.
Reason I wrote that particular XSL to match on foxml:contentLocation/@REF is that within the Fedora Web Administrator, the "Object XML" does contain foxml:contentLocation elements. For example, the FOXML snippet for the OBJ datastream that I referred you to in the gist linked above is:
`
Really glad you figured out the weirdness!
Just a thought, instead of 'archive' could 'migrate' be used to reduce the file size of the binary stuff on line 110 in utilities.inc ?
Awesome suggestion - using the 'migrate' format FOXML you get usable contentLocation values:
<foxml:contentLocation TYPE="INTERNAL_ID" REF="http://localhost:8080/fedora/get/changeme:20/TECHMD/2013-12-18T06:52:46.100Z"/>
As long as we're happy with this sort of URL (not sure what relevance 'localhost' has in the context of a PREMIS XML file), we're good to go with a one-line change to utilites.inc.
Oh. This makes since then, because I was getting my FOXML from the web administrator as well.
:+1: on the proposed solution. I think value presented there is more representative of "where" it actually is.
Nice catch and the contentLocation values look more meaningful to me. I'm guessing that the localhost:8080 is what's defined in the fedora.fcfg file?
Looks good!
@dmoses I think the hostname is the one configured in the main Islandora module's admin settings, since that's where Islandora deposited the content. One 'feature' of the PREMIS module is that since it is generated on the fly, it's a snapshot of what Islandora thinks is the current location for a datastream at the time PREMIS is viewed/downloaded. Not sure if that answers your question though.
foxml:contentLocation REF value is not being selected so premis:contentLocationValue is always empty.