Closed eporter23 closed 1 year ago
Preliminary Findings:
file
is present, but, of course, it's using that pesky _1
numbering system. Just a reminder: our Bulkrax importing won't operate on files
, only file
.file_type
was originally intended as a throwaway field that was used to help process the files into the right mounter. I had no clue that exporting was in our horizon. Is it okay if we make it a stored attribute in FileSet objects, @eporter23 ? Otherwise, I'd have to generate it dynamically everytime "on the way out."pcdm_use
, which is exposing a bug I think our Bulkrax importing has--FSs are being saved with that field unpopulated. I will have to dig into the cause of that soon.@bwatson78 thanks for the reminder about file vs files, I always forget. With file_type it's fine with me if we store it for FileSet objects, but I also expect that would require a lot of reindexing. To avoid reindexing ~300K filesets, I guess we could incorporate that into any export-related workflows so that we reindex selected ids prior to exporting.
With the pcdm_use situation, I think that Zizia (or maybe Curate) defaults that to Primary Content unless otherwise specified. I'm pretty sure Zizia only requires that it be populated if it's something different like Supplemental Content or Supplemental Preservation File. All that to say, I think it's okay if it's not populated because we can infer that it's Primary Content unless there's a value there.
@bwatson78 I just tested the revised Visibility output and it is much improved. There are two settings that I wanted to ask about. Is it possible to adjust these?
Public currently outputs as Open
Emory High Download currently outputs as Authenticated
.
This is where we run into the issue with multiple values being assigned to multiple keys. Unfortunately, for both, these are the first keys that the system encounters when provided with the value.
Initial testing was looking great for the multi-valued fields, but I just now exported a portion of a book that had been imported by Bulkrax (if that has any significance - the other exports had not been). I'm seeing date_created
and date_created_1
as well as holding_repository
and holding_repository_1
. Will send you the CSV in Slack so you can see.
@bwatson78 for those 2 visibility values, do you know if the importer would accept those and translate them correctly? As in if we exported and fed it back in, would Open
be interpreted as "Public"?
Yes, they would.
@bwatson78 All of the items in this ticket are looking great.
After testing basic export capabilities for metadata, we notice that the exported CSV structure is different than our importer CSV structure.
We want to ensure that Curate users can use these exported CSVs to do metadata cleanup and then re-import them later.
Current issues:
file
column does not exist, file IDs are instead stored in achildren
column. Ideally we want to export the original filenames and not the ID for the fileset.file_type
is not presentpcdm_use
is not presentvisibility
field values should translate back to our labels (e.g. Private, Emory High Download, Public Low Res etc.)