Closed bondjimbond closed 5 years ago
@mjordan This ensures that at least a basic Bag is present. We can deal with the specifics of different plugins another time. What do you think of this approach?
@mjordan I don't expect you'll have time to test this right away, but have you any thoughts about the approach?
I agree with this approach, but have a couple of comments:
data/
, only that the Bag must be "complete" as defined in the spec and that "Every checksum in every payload manifest and tag manifest has been successfully verified against the contents of the corresponding file." We should be careful with using 'validate' or 'valid' in reference to Bags, since they a specific meaning, which is defined at https://tools.ietf.org/html/rfc8493#page-14. Validating a Bag does not including determining if a particular file is present in data/, only that the Bag must be "complete" as defined in the spec and that "Every checksum in every payload manifest and tag manifest has been successfully verified against the contents of the corresponding file."
So that is covered by the if ($bag->isValid())
check, correct?
In terms of the language used in the messages, I think that's covered... we have "is valid" or "does not validate" based on the check above, and the message from the checks I added just says "is missing files". Does that all check out?
I think "We can deal with the specifics of different plugins another time." is reasonable but dangerous in that if your users start enabling additional Islandora Bagit plugins and then creating Bags for deposit into WestVault, you risk having Bags there that might not meet your criteria.
Can you elaborate on the last bit (bags that might not meet my criteria)?
For an MVP, assuming this PR passes testing besides my own, I think this brings us there... but we will certainly want to add some improvements over time to support better checking with specific plugins.
So that is covered by the if ($bag->isValid()) check, correct?
Yes, but that's all it does.
Can you elaborate on the last bit (bags that might not meet my criteria)?
Your criteria are that the Bag contains a specific set of files, e.g., OBJ.something, MODS.xml, etc. which Bag validation doesn't address.
Your criteria are that the Bag contains a specific set of files, e.g., OBJ.something, MODS.xml, etc.
Well, that's what the checks introduced here are for... We check the bag contents for files, and compare against the list of datastreams. In theory, this code ensures that every Bag contains all expected files. It's possible due to plugin problems that they could be in the wrong subdirectories, perhaps, but other than that, the minimum criteria of "each datastream has a file" should be met here.
If the other plugins insert extra files that I'm not accounting for, that's a later step.
Unless I'm missing something? Or are we both saying the same thing but from different angles?
We're saying the same thing. I'll try test this over the weekend.
Cool, thanks! Unless I'm mistaken, once this is done we're pretty much set for a beta release, yes?
Any chance you had an opportunity to test this, @mjordan?
@bondjimbond testing didn't go as planned. After modifying the drush sript, for the "Un-preserve your object, then preserve again" step, I unchecked the "Preserve this object" box, saved, then checked it and resaved. Then reran drush -u 1 westvault-bagit --debug
. There was no error, drush showed green:
[...]
Bag created and saved at sites/default/files/Bag-islandora_7.tgz [1.08 sec, 35.43 MB] [status]
Files added: RELS-EXT.rdf, MODS.xml, DC.bin, OBJ.jpg, TECHMD.xml, TN.jpg, MEDIUM_SIZE.jpg, PRESERVATION.bin [1.08 sec, 35.43 MB] [status]
[...]
The bag was still in my owncloud directory. I also repeated these steps after clearing my cache.
Did I test it correctly?
Hmm, did you delete the bag from the Owncloud directory before making the modification? (I should have included that in my testing steps, sorry.)
Also, try adding dd($bag_contents)
at line 194 and check what's in position ['zero']
; if that's not one of the datastream files, then the check won't pick it up. So instead pick whichever array key corresponds to a more important file.
Sorry, still not getting it. I added the following to the drush command:
// FOR TESTING ONLY
unset($bag_contents['3']);
which should be removing the OBJ, but it's not. When I unzip the resulting bag, the OBJ is there. Consequently, there is no error.
The idea here is that it unsets the file from the array that reports the files in the bag and before the two arrays are compared. It doesn't actually erase the OBJ, it just tricks the module into thinking the OBJ is missing.
So what should be happening:
If you followed all the steps above, but the newly-created Bag is not actually being deleted -- can you check the object's PRESERVATION datastream and report its contents?
Followed the steps (plus deleting the bag before making the modification to the drush command), still no dice. There is no logged error in the drush output or in watchdog. The bag file is regenerated. Here is the object's PRESERVATION ds contents:
`
`
Hmm... Can you report the result of dd($bag_contents)
(put right after the line unset($bag_contents[0])
, and dd($datastreams)
right afterward?
Added those lines, unpreserved, then represerved the object, and reran the drush command:
Array
(
[1] => MEDIUM_SIZE
[2] => MODS
[3] => OBJ
[4] => PRESERVATION
[5] => RELS-EXT
[6] => TECHMD
[7] => TN
[zero] => DC
)
Array
(
[1] => RELS-EXT
[2] => MODS
[3] => DC
[4] => OBJ
[5] => TECHMD
[6] => TN
[7] => MEDIUM_SIZE
[8] => PRESERVATION
)
Heading to work on, can pick this up this evening.
So then after that you unset $bag_contents[3]
, right? But before the comparison script?
Here are my changes:
191 // Prevents an incorrect 'false' result in the array_search below.
192 $bag_contents['zero'] = $bag_contents[0];
193 unset($bag_contents[0]);
194 dd($bag_contents);
195 dd($datastreams);
196 // Compare $datastreams to $bag_contents.
and
205 // FOR TESTING ONLY
206 unset($bag_contents['3']);
207 if (!isset($is_valid)) {
208 $is_valid = TRUE;
209 }
@mjordan Crap... Did I give you the wrong line for the unset
?
It should go right before this block of code:
// Compare $datastreams to $bag_contents.
foreach ($datastreams AS $datastream) {
$datastream_locator = array_search($datastream, $bag_contents, TRUE);
if ($datastream_locator == FALSE) {
drush_log(dt("Bag at !path missing the !datastream datastream.", array('!path' => $bag_path, '!datastream' => $datastream), 'warning'));
$is_valid = FALSE;
}
}
Is it the "FOR TESTING ONLY" unset() that I put there?
Yep. Stick that right above the // Compare $datastreams to $bag_contents.
comment and it should work.
OK, that prevented the bag from being created, and Drush told me "Bag /tmp/owncloud/Bag-islandora_7.tgz was is missing files and will be removed. Check directory permissions and try again." Is that the error I am looking for? If so, I recommend writing an entry to the watchdog since an admin might not know that the bag was not created.
Or maybe some other even more in-your-face way of telling the admin there was a problem, like an email. Islandora Checksum Checker, for example, sends an email on checksum mismatch.
@mjordan Yes, that's the error you want. So how would you suggest it get logged... Watchdog?
I think Watchdog at a minimum.
Let's do Watchdog for now. Once this is merged, another issue for emailing the admin.
With the latest commit, would you say this passes your tests?
Yup!
Thanks!
Here's how Checksum Checker does it:
Addresses #22
Adds validation functions to check bag contents against the object's datastreams.
If any datastream is not represented by a file in the Bag, the error is logged and the Bag is deleted. The object's PRESERVATION datastream is not updated, so it will still be ready to be preserved next time.
To test:
drush -u 1 westvault-bagit --debug
islandora_westvault.drush.inc
and add the following at line 196:drush -u 1 westvault-bagit --debug