Princeton-CDH / mep-django

Shakespeare and Company Project - Python/Django web application
https://shakespeareandco.princeton.edu
Apache License 2.0
5 stars 1 forks source link

As an admin, I want items updated with matching OCLC work URI, best match edition URI, genre, and subjects so that I can include information from OCLC so users will know more about the books. #291

Closed rlskoeser closed 5 years ago

rlskoeser commented 5 years ago

Notes for testing

dev notes

rlskoeser commented 5 years ago

Re-estimating to a 5; I think we didn't account for the complexity of handling the subjects. I decided to implement as a many-to-many with a new model, but even without that would have had to resolve the subject URIs to get a useful human-readable label for them.

clmahoney commented 5 years ago
rlskoeser commented 5 years ago

Based on the additional refinements, upgrading from 5pts to 8

rlskoeser commented 5 years ago

@clmahoney I've updated the script based on feedback from you and @jkotin and have re-run it on the test site (I copied over a fresh set of production data and then ran the new version of the script). It's now requesting english language and not electronic editions when it does the OCLC search, but I haven't looked closely to see how much of a difference that's making.

@clmahoney I updated the testing notes to try to indicate what I think you've already signed off on (from your last comment) and added the new things I think you should check.

Here's the summary output from the first time I ran the script:

Processed 7040 items, updated 5331, no matches for 1698

I ran it again to help you test that it's no longer re-checking things that have been previously searched and flagged as no match, here's the output:

Processed 15 items, updated 15, no matches for 0

There were a handful of OCLC data loading errors on the first run that I'm not currently reporting; those were the 15 that it processed on the second one.)

rlskoeser commented 5 years ago

It's not strictly part of this story, so I didn't add it to the testing checklist, but feel free to try: you can go to the list of genres or the list of subjects and remove one that you don't want - removing it will remove it from any books it was associated with, without having to edit all those books individually.

clmahoney commented 5 years ago
rlskoeser commented 5 years ago

@clmahoney I'm not sure that it's possible to completely fix the non-english results or completely exclude ebooks. Can you tell if the results are better?