OregonDigital / OD2-migration

Project board for migration tracking
0 stars 0 forks source link

International Freshwater Treaties Collection (freshwater-treaties) #1156

Closed wickr closed 1 month ago

wickr commented 1 year ago

Item Count: 656 items

Item Types: Document

Access Restrictions: 0

Complex Objects: 20

pid List: https://github.com/OregonDigital/OD2-migration/blob/master/freshwater-treaties/freshwater-treaties_nocpds.txt https://github.com/OregonDigital/OD2-migration/blob/master/freshwater-treaties/freshwater-treaties_cpds.txt

wickr commented 1 year ago

PDFs fixed, derivatives looking good. Compounds have no errors. Only validation error left is on one with a Source field but the data is fine (verifier doesn't like the line break)

carakey commented 1 year ago



Search/indexing/faceting check:


Work pages:

Collection pages:

carakey commented 1 year ago

There is an issue with the Creator label on the URI http://id.loc.gov/authorities/names/n90718759 (label should be "Czech Republic" but instead showing some backend metadata: "Product of split:--Czech Republic--http://rdaregistry.info/Elements/u/P60685"), affecting show pages on 8 works. I attempted to manually remove and re-add the URI on 9c67wr914 which did not correct the label. Then noticed that the Data Sources tab is showing the label as expected, while the main show page has the error. This is true on the edited work and the non-edited ones.

Also note that on Browse All, six works have the erroneous label appearing in the filter list, but two works have the correct label.

carakey commented 1 year ago

The Geonames term 'Po River' in Water Basin is causing the same split-faceting issue we saw in cities in multiple counties. I reopened 2452.

carakey commented 1 year ago

A couple of weird PDF mime types showing up. Maybe worth reprocessing and replacing the files.


carakey commented 1 year ago

^ @wickr I will take a stab at the PDF replacement. I'm planning to move forward with item level QA otherwise, assuming these aren't migration issues but underlying system things.

wickr commented 1 year ago

@carakey Ok I fixed the Czech Republic label issue. Somehow that 'Product of split' string got included as another prefLabel at one point. I cleared out the blazegraph cache and refetched, and didn't see it come back. I reindexed all 8 works just to be sure and they're showing together now.

The PDF mpeg one is probably the only one I'd worry about for now, unless you want to do the EXIF one too, but that one might have EXIF metadata somewhere, so it's not really wrong.

Yeah I don't think anything else is migration-related.

carakey commented 1 year ago

thanks -- but what's the difference between pdf (PDF/A) (# 2 with 28 works) and pdf (PDF/A, Portable Document Format) (# 5 with 1 work)?

wickr commented 1 year ago

@carakey not sure. I'm sure we can collapse the values in indexing in the future, but so far we haven't messed with what is being extracted in characterization.

carakey commented 1 year ago

The 3 items whose PDFs I replaced, the thumbnails are not appearing on browse all / search results. The larger thumbnail is showing up ok on the (logged-in admin) work show page, and in the UV.

https://oregondigital.org/concern/documents/gb19fs332 https://oregondigital.org/concern/documents/gb19fr191 https://oregondigital.org/concern/documents/gb19fq887

carakey commented 1 year ago

I noticed on a compound child, that the linked parent title has an unresolved character encoding showing -- "His Majesty's government..." The encoding shows correctly in metadata and links elsewhere.


carakey commented 1 year ago


In addition to the thumbnail issue in an earlier comment, there are some inconsistencies with fileset downloads. Out of the set of 18 works that I approved for QA:

Aside from these issues, things look good. Collection can be bulk approved once the filesets issues are fixed.

wickr commented 1 year ago

Reindexed all works that weren't showing thumbnails and they're showing now.

Made a ticket for the apostrophe entity reference showing: https://github.com/OregonDigital/OD2/issues/2861

Still looking at the download issues.

wickr commented 1 year ago

For the FileSets, for the 5 pids you linked, I'm seeing Download options showing up and working.

For the compound parents, I'm seeing both Standard and High Quality download options on both of them now. It's possible not all of the FileSets were indexed fully earlier.

I'm going to go ahead and bulk approve.

wickr commented 1 year ago

All works are reviewed and looking good. Counts look good too.

@carakey when you have a chance the collection homepage could use the 4 featured docs/thumbnails

carakey commented 1 year ago

@wickr I finally had a chance and added some featured docs.