SFULibrary / islandora_datastream_crud

Islandora Drush module for performing Create, Read, Update, and Delete operations on datastreams.
GNU General Public License v3.0
15 stars 15 forks source link

DC Datastream not updated from MODS when updating MODS datastream #17

Closed nate-rcl closed 7 years ago

nate-rcl commented 7 years ago

When updating datastreams if the datastream being updated is MODS it would seem like updating the DC by default would be a good idea by default. Possibly adding a flag that turns this off might be needed but it would seem highly unusual to have a MODS file with metadata that does not match the DC metadata.

mjordan commented 7 years ago

@nate-rcl agreed but some sites use metadata schemas other than MODS. I think we do want to do what you describe by default but I do not have any good ideas for accommodating non-MODS datastreams.

alehman-loc commented 7 years ago

I am also using this module to update MODS datastreams for DPLA aggregation and need to trigger a standard DC update for consistency/display. Even if you don't accommodate other "custom" datastreams and crosswalks, this seems like the normal functioning in Islandora: when you update MODS it updates the DC.

Thanks so much for your time on this Mark, it's already saving me loads of work!!

mjordan commented 7 years ago

@amandarl thanks for the input, I'll revisit this in light of recent work done on #34 (and some discussion in #18), where we accommodate non-MODS source datastreams for updating the object label. However, IIRC regenerating DC is a bit tricky if the site uses nonstandard XSLT (i.e., not the standard LoC XSLT) to generate the DC. I'll take a look.

mjordan commented 7 years ago

There are three places where the Library of Congress MODS-to-DC stylesheet exists:

They're all identical (on the Vagrant anyway) but we'll either need to:

I'm leaning toward the second option, which should allow users to update DC from DDI or some other non-MODS source datastream. Thoughts anyone?

mjordan commented 7 years ago

Commands would look like this:

To update DC from MODS using default XSLT at islandora_xml_forms/builder/transforms/mods_to_dc.xsl (same command as used now to push MODS datastreams because this behavior is the default):

drush islandora_datastream_crud_push_datastreams --user=admin --datastreams_source_directory=/tmp/mods_datastreams

To update DC using non-default XSLT:

drush islandora_datastream_crud_push_datastreams --user=admin --datastreams_source_directory=/tmp/mods_datastreams --dc_transform=/path/to/my/custom/stylesheet

To not update DC:

drush islandora_datastream_crud_push_datastreams --user=admin --datastreams_source_directory=/tmp/mods_datastreams --update_dc=false

Maybe a prompt to the user "Do you want to update your DC datastreams from the XXX datastream files you are pushing (y/n)?" might be the safest? It could be bypassed by using drush's -y option.

To update DC from DDI using another XSLT:

drush islandora_datastream_crud_push_datastreams --user=admin --datastreams_source_directory=/tmp/ddi_datastreams --dc_transform=/path/to/my/custom/dditodc_stylesheet

I'm still struggling with the non-MODS source datastreams, specifically, how does Datastream CRUD know it should attempt to update DC? What would prevent it from attempting to do so if the user was pushing up OBJ or other non-metadata datastreams? The only strategy I can think of is that it only attempts to update the DC datastream if the datastream files in --datastreams_source_directory end in .xml. To handle the case where the user is pushing datastream files that are not metadata but do end in .xml (SFU has a real use case for this, sorry!), they can use the --update_dc=false option (or say no to a prompt). If they forget to add that (which is going to happen eventually, and is potentially damaging since now the default behavior is to update the DC), the update action would presumably fail because the transform would do nothing. (Hmm, I'm thinking a prompt is the safest.) If it did not fail and accidentally destroyed the DC, they could always revert the DC, e.g. by running

drush islandora_datastream_crud_fetch_datastreams --user=admin --pid_file=/tmp/imagepids.txt --dsid=MODS --datastreams_directory=/tmp/imagemods --datasteams_version=1

Sorry to make this sound so complicated, but it is!

mjordan commented 7 years ago

@nate-rcl and @amandarl, and anyone else interested in this feature (maybe @DiegoPino and @giancarlobi?), I'd appreciate some feedback at this point.

I've got the overall logic/prompts to allow for the updating of DC from MODS or other datastreams. It is basically as described in the previous comment, but I'll run through it here to make sure it's satisfactory. Running

drush islandora_datastream_crud_push_datastreams --user=admin --datastreams_source_directory=/tmp/issue17mods

will produce the same prompt it does now:

You are about to push datastreams to objects in your repository. This will create new versions of the datastreams, or create new datastreams if none exist. Do you want to want to continue? (y/n):

If you reply y, your datastream files will get pushed, as they do now, but you will get an additional prompt:

Do you want to update each object's DC datastream using the new MODS? (y/n):

If you reply y, the DC datastream of each object represented by a pushed MODS datastream will be regenerated; if you reply n, your MODS will get pushed as it does now but no DC regeneration happens.

If you provide the option --update_dc=0 in your drush command (I'm still figuring out how to make this option a bit less cryptic and it might change in the final version), the prompt to regenerate DC is skipped, and no regeneration happens. If you provide the option -y in your drush command, the regeneration happens without prompting you (-y is the standard drush flag indicating "yes" to all prompts).

If your datastream files don't end in .xml, the prompt to regenerate DC is skipped (in other words, you're pushing images, PDFs, or some other non-XML files), and no regeneration happens.

Does that meet everyone's needs?

giancarlobi commented 7 years ago

Hi Mark,

I don't use MODS but I think CRUD is a really powerful modules to manage objects so every improvement is welcome, thanks a lot Mark!!

As a batch-scripter I prefer option rather than prompt/answer. IMHO the logic could be:

Just my point of view.

Giancarlo

On 28/06/2017 00:18, Mark Jordan wrote:

@nate-rcl https://github.com/nate-rcl and @amandarl https://github.com/amandarl, and anyone else interested in this feature (maybe @DiegoPino https://github.com/diegopino and @giancarlobi https://github.com/giancarlobi?), I'd appreciate some feedback at this point.

I've got the overall logic/prompts to allow for the updating of DC from MODS or other datastreams. It is basically as described in the previous comment, but I'll run through it here to make sure it's satisfactory. Running

|drush islandora_datastream_crud_push_datastreams --user=admin --datastreams_source_directory=/tmp/issue17mods|

will produce the same prompt it does now:

|You are about to push datastreams to objects in your repository. This will create new versions of the datastreams, or create new datastreams if none exist. Do you want to want to continue? (y/n): y|

If you reply |y|, your datastream files will get pushed, as they do now, but you will get an additional prompt:

|Do you want to update each object's DC datastream using the new MODS? (y/n):|

If you reply |y|, the DC datastream of each object represented by a pushed MODS datastream will be regenerated; if you reply |n|, your MODS will get pushed as it does now but no regeneration happens.

If you provide the option |--update_dc=0| in your drush command (I'm still figuring out how to make this option a bit less cryptic and it might change in the final version), the prompt to regenerate DC is skipped, and no regeneration happens. If you provide the option |-y| in your drush command, the regeneration happens without prompting you (|-y| is the standard drush flag indicating "yes" to all prompts).

If your datastream files don't end in |.xml|, the prompt to regenerate DC is skipped (in other words, you're pushing images, PDFs, or some other non-XML files), and no regeneration happens.

Does that meet everyone's needs?

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/mjordan/islandora_datastream_crud/issues/17#issuecomment-311502265, or mute the thread https://github.com/notifications/unsubscribe-auth/AECT8vym4Kk6gtQaKsOC4LuGY10ThsVdks5sIX-xgaJpZM4K4S7N.

mjordan commented 7 years ago

@giancarlobi was in a conversation yesterday with @bondjimbond about how common it is for sites to not use MODS, thanks for your example.

With regard to a default behavior for updating DC, it seems your preference is contrary to that of @nate-rcl and @amandarl. Since we can't have two different default behaviors, I'm tending toward their preference. Sorry! Currently, not updating DC and bypassing all prompts can be achieved by using --update_dc=0 and -y in the drush command. I've got this all in place now and will be doing some further testing over the next couple of days.

Of course if you wanted to update your DC from non-MODS you'd need to specify the path to an XSLT stylesheet on your server, e.g., --dc_transform=/path/to/my/custom/dditodc_stylesheet.xslt. CRUD will detect the DSID you are pushing automatically.

ubermichael commented 7 years ago

Success!

drush islandora_datastream_crud_push_datastreams --user=admin --datastreams_source_directory=/tmp/mods_datastreams worked as described. I'll test out the non-default XSLT in a little while (once I've written one).

mjordan commented 7 years ago

@ubermichael sweet, thanks.

mjordan commented 7 years ago

I've merged this into 7.x with 3a915989e78199481018a5cc04a8e9fdab076ab7. Thanks everyone for helping with this feature.