asulibraries / islandora-repo

ASU Digital Repository on Islandora
GNU General Public License v2.0
4 stars 4 forks source link

Add an export as CSV option #333

Closed elizoller closed 3 years ago

elizoller commented 3 years ago

Since its clear some folks will still work in and out of CSVs, it'd be nice to have an export to CSV that would give us the data formatted the same way we expect them to import it back in (ie matching the spreadsheet template). two options that might get us there -

elizoller commented 3 years ago

https://docs.google.com/spreadsheets/d/10RRT0wZ0A3oSDQDMSG0K7QC-QYhk7dgTqbuwse7G2sU/edit?pli=1#gid=0

wgilling commented 3 years ago

multiple instances of a field should be in separate CSV columns -- so that it could be easily worked on by metadata users.... which means that the process of importing their updated CSV's multiple columns will need to be merged. This could potentially be done during the "preprocess import" routine for the CSV import plugin. This blog describes a way to use prepareRow to achieve this: https://www.mediacurrent.com/blog/migration-custom-values-drupal-8/

wgilling commented 3 years ago

upon review of the entity_export_csv, there are so many things that I feel it wouldn't handle. This includes:

taxonomy term lookup - rather than populating with the $tid value note type handling - where only one is the Statement of responsibility export a field based on attribute of relationship (for Contrib, Corp body, Person) handling of export based on field_model for "Collections" (from either "Primary member of" or any "additional membership") considering "Complex object children" also use "Primary member of" as their "Collection parent object" instead of collection.

The values for taxonomy terms (Subjects, Linked agents, etc), the $tid integer value could not be used by the metadata editors.

The views_data_export view may be more promising. The specific problem fields could be handled by writing formatter plugins. Attached is the CSV that was created using many of the asu_repository_item fields "out of the box"... although, but as of yet, I am not sure how multiple value fields would be handled.

wgilling commented 3 years ago

the file was created by going to the path "items/5454/csv?_format=csv" and it was named "csv_export_2021-03-09.csv" but gitHub does not allow CSV file attachments... this was merely saved as xlsx using Microsoft Excel.

wgilling commented 3 years ago

csv_export_2021-03-09.xlsx

wgilling commented 3 years ago

So far, it appears that the https://www.drupal.org/project/views_data_export module will handle our needs -- as long as we write formatters for special field handling.

In the case of a collection, we can start with a Solr "Index default solr content index" view and make use of the same contextual filter that the collection search is using.

wgilling commented 3 years ago

This is a work in progress.... The two xlsx files (because github does not allow a CSV file to be upload attached) are from two different views. The first file is from the single node-content view export and it has a nicely formatted filename able to be calculated.

current csv_export_2021-03-11.xlsx

The second xlsx is for a collection view that is based off of a SOLR search. Because of this, some node-based fields are not available since they are not being indexed.

current collection csv.xlsx

wgilling commented 3 years ago

Add a migration script to import one of these CSV that can be used with the Migrate Source UI. "Import Standardized ASU CSV"

wgilling commented 3 years ago

This feature may be added as a link in the Admin Toolbox... we will discuss this at the next repo Design meeting - made an issue https://github.com/asulibraries/islandora-repo/issues/352

wgilling commented 3 years ago

The resulting CSV (saved as xlsx because github does not like CSV file attachments):

csv - 2021-03-23T115957.354.xlsx

Next, I have to write an importer that will use this file (via Migrate Source UI) to see if it can update and create the objects.

wgilling commented 3 years ago

There is an issue where a title value gets encoded. It only happens to the node.field_title (paragraph). The title "A culturally competent behavioral weight loss program for adult Latinos with a BMI >30kg/m2" is being sent to the CSV as "A culturally competent behavioral weight loss program for adult Latinos with a BMI >30kg/m2".

wgilling commented 3 years ago

I noticed an issue with the NameURIGenerate plugin and addressed it in the NameURILookup parent class. The issue was that a hypothetical term value could be in a given column that designates the field relates to the genre vocabulary such as "Term title|authority_uri", but there was a term in the person vocabulary that had the same authority_uri value... and that was being returned. The query should be limited to the relevant vocabulary and wasn't.

In the commit https://github.com/asulibraries/islandora-repo/commit/3848e32f12a771eca74506801affa5641c56a177 "adjust the NameURILookup plugin to limit results on the provided default_vocabulary (optional) and minor adjustment to the import CSV script for #333", the NameURILookup was adjusted to limit results on the provided default_vocabulary (optional).