Closed mjordan closed 10 years ago
I think including Internal makes sense. We may be able to use the DC for populating a basic Rights PREMIS statement ? The Redirected datastream type could be used in the context of a revised video or audio solution pack. The problem with External is that Fedora is really only managing the access to the content. How would checksums, etc. be run/reported on when the file isn't actually stored in the repo? Would be good to do some testing there.
True about Redirected datastreams, and I'm also wondering if we should include them as objects entities in PREMIS just for the sake of completeness (i.e., to identify every datastream in an Islandora object)? Same goes for External. There would never be any fixity check events on these two types but if we end up including addDatastream, modifyDatastream, etc. events (from issue #13), we could include those events for these two types of datastreams.
Rights entities would be good to have, but I'm a bit confused about where Rights info comes from in most Islandora instances. The basic MODS entry form doesn't contain any mods:accessCondition elements that I can see, and since DC is derived from MODS, I don't know where we'd get this information. I'm probably missing something obvious. Where do you guys populate rights info in your metadata?
I don't think the default forms have rights elements. But, we have added them for a bunch of our forms. Here is an example record.
Thanks for the example. Even if there's nothing in the dc:rights element, worst that will happen in the output document is an empty premis:rightsStatement entity (or maybe we want to put some sort of generic XML comment in the rightsStatement saying that we don't have any rights info for that object). Or we could even do a conditional trasform - if there's no dc:rights, don't output anything.
How hard would it be to pull from mods_accessCondition_useAndReproduction_ms in Solr? Or would be hitting Solr be too much overhead?
Using a PHP extension function, you can do anything in your stylesheet that PHP can do. If we added one to foxml_to_premis.xsl we could have code in the module grab the data from Solr, munge it if necessary, and then pass it into foxml_to_premis.xsl as a stylesheet parameter. I had some difficulty passing entire chunks of XML into the stylesheet as parameters but apparently it's possible with some DOM-foo. We'll need to do this sort of thing if we want to include FITS data in our PREMIS, for example, so I'll see if I can make some headway in a "Hello XSLT output from external XML blob world" script.
This thread is getting a bit off topic but we need to deal with this stuff anyway.
Got a basic script working that passes an entire XML document into the XSLT output (using output from FITS as my test) as referenced in my previous comment, without using a PHP extension. Bad new however: PHP is not able to pass strings into an XSL stylesheet if they contains both double and single quotes (bug reference https://bugs.php.net/bug.php?id=64137).
tl;dr is that unless we are sure our foreign XML chunk (FITS, mods:accessCondition, whatever) only contains one type of quote or the other, we can't pass it into the stylesheet for inclusion in the PREMIS output. This is unfortunate if we can't find a workaround. A hack using XSLT's concat() function is supplied in the PHP bug report but I haven't fully figured out how it can help us in this case.
Anyone got any experience working around this problem?
One workaround would be to insert the foreign XML into the FOXML before we applied the stylesheet.
We use mods:accessCondition - http://www.islandscholar.ca/fedora/repository/ir%3Air-batch6-5794 . Rights statements aren't included in all records though. I'm scaffolding some premis rights code and will share that shortly.
Closing a sub-issue here by Integrating @dmoses XSL work on transforming dc:rights to premis:rights in commit 524c530. Creating new issue (#16).
WRT the question about External and Redirected datastreams, turns out that FC does do checksums on them. I added one of each to an object via the Fedora Web Admin and then ran Checksum Checker on the object. Here are the audit log records:
<!-- External datastream -->
<audit:record ID="AUDREC121">
<audit:process type="Fedora API-M"/>
<audit:action>modifyObject</audit:action>
<audit:componentID></audit:componentID>
<audit:responsibility>fedoraAdmin</audit:responsibility>
<audit:date>2014-01-08T02:11:06.747Z</audit:date>
<audit:justification>PREMIS:file=http://edocs.lib.sfu.ca/projects/Cartoons/thumbnails/1-1955-01-07.gif; PREMIS:eventType=fixity check; PREMIS:eventOutcome=SHA-1 checksum validated.
</audit:justification>
<!-- Redirect datastream -->
</audit:record>
<audit:record ID="AUDREC122">
<audit:process type="Fedora API-M"/>
<audit:action>modifyObject</audit:action>
<audit:componentID></audit:componentID>
<audit:responsibility>fedoraAdmin</audit:responsibility>
<audit:date>2014-01-08T02:11:06.876Z</audit:date>
<audit:justification>PREMIS:file=http://edocs.lib.sfu.ca/cgi-bin/Cartoons?CartoonID=6186; PREMIS:eventType=fixity check; PREMIS:eventOutcome=SHA-1 checksum validated.
</audit:justification>
So if we added all types of datastreams (have Managed now, should add Internal, could add Redirect and External), we'd have checksum verification on them all.
Given this, should we go ahead and add fixity check audit records for all types of datastreams to our PREMIS output?
:+1:
Here's a sample of a PREMIS file containing audit records for all types of datastreams: https://gist.github.com/mjordan/8310775
Woops, I notice that the records for the External and Redirect are not included. Records for Internal are though. Let me work on it.
If we include External and Redirect, we need to rethink the test for composition level in foxml_to_premis.xsl:
<xsl:choose>
<xsl:when test="starts-with(@ID, 'OBJ') or starts-with(@ID, 'MODS')">
<compositionLevel>0</compositionLevel>
</xsl:when>
<xsl:otherwise>
<compositionLevel>1</compositionLevel>
</xsl:otherwise>
</xsl:choose>
as we don't know the compositional level of the external or redirected files. In fact, looking at compositionLevel's entry in the PREMIS 2.2 Data Dictionary, I'm not sure we're using it properly -- it seems to apply to compressed to encrypted files and bitstreams. Maybe we should just use the default, '0'.
OK, I am ready to merge my feature branch that includes events for all types of datastreams into 7.x. Please vote!
Still need to work on issue #21.
I'm with you on interpreting compositionLevel
as 0. Found this from Yale's guide:
"Supply value even when object is uncompressed and unencrpyted, e.g., assign 0 for base level, 1 for compressed file, 2 for compressed and encrypted file."
Thanks for the Yale's guide interpretation. Yep on the compositionLevel to default as 0. Yes on including events for all types of datastreams .
OK, I'll merge in my feature branch that removes the Managed-only pattern, and I'll also remove the compositionLevel logic.
Mind if I close this? (Done in commit b2b9636d5050de29c22d0589c4964bf02f4b4f3a)
If we do close it, we'll need to update the README, which mentions Managed datastreams.
We should great a ticket for compositionLevel
.
Currently, only Managed datastreams are selected for inclusion in the PREMIS XML. Should we be including Internal, External, and Redirect datastreams as well? Probably Internal (for example, DC is internal), but are External and Redirect datastreams used at all in Islandora?