emory-libraries / dlp-selfdeposit

0 stars 0 forks source link

Metadata transforms batch 2 - complex transforms #321

Closed eporter23 closed 22 hours ago

eporter23 commented 1 month ago

The following fields may require more complex transforms.

The data for these fields can be found in the descMetadata (MODS) XML file for each exported work.

Please see the filtered view of the worksheet and note the OE xpath column as a starting point (though the xpath may need to be improved).

Xpath/treatment still TBD (and may move to a pt.3 ticket):

eporter23 commented 5 days ago

To do: find a record that has something other than Final Published Version to check out the 3 values: Example of "Preprint": emory:cr9m9 Example of "Post-print": emory:rr591 SOLR q: type:*print*

eporter23 commented 1 day ago

@bwatson78 can you run your script on the above IDs?

eporter23 commented 22 hours ago

We ran into some issues with datastreams/IDs not being accessible in Fedora 3. Noting this in case it recurs when doing large scale exports in the future.

eporter23 commented 3 hours ago

Example: this object exists in SOLR and has state "A", but returns the following when accessed in Fedora 3 directly: no path in db registry for [emory:rr591]

eporter23 commented 3 hours ago

After looking more closely, the above ID does not have a "content" datastream. Making notes elsewhere to remind myself to exclude these records. This title also seems to have come in as a Post-print first, and the Final Published Version is now in Fedora and SOLR.

"dsids":["descMetadata",
          "SYMPLECTIC-ATOM",
          "DC",
          "SYMPLECTIC-LICENCE",
          "provenanceMetadata",
          "RELS-EXT"],