Closed rlskoeser closed 5 years ago
Re-estimating to a 5; I think we didn't account for the complexity of handling the subjects. I decided to implement as a many-to-many with a new model, but even without that would have had to resolve the subject URIs to get a useful human-readable label for them.
no reconciliation for GENERIC, PROBLEM, OBSCURE, ZERO; Items with titles ending in *
I see Items that are updated with OCLC matched with the following: work URI, edition URI (for best match found), format, genre, and subject list.
I see note in item history that says it was updated by script and the OCLC id that was used
Items that are not PERIODICAL are matched with book
PERIODICALS are matched with types of periodicals
Finding some interesting genres, though: sometimes the genre is in another language besides English
finding weird subject fields too: "Personal copy" ?
Based on the additional refinements, upgrading from 5pts to 8
@clmahoney I've updated the script based on feedback from you and @jkotin and have re-run it on the test site (I copied over a fresh set of production data and then ran the new version of the script). It's now requesting english language and not electronic editions when it does the OCLC search, but I haven't looked closely to see how much of a difference that's making.
@clmahoney I updated the testing notes to try to indicate what I think you've already signed off on (from your last comment) and added the new things I think you should check.
Here's the summary output from the first time I ran the script:
Processed 7040 items, updated 5331, no matches for 1698
I ran it again to help you test that it's no longer re-checking things that have been previously searched and flagged as no match, here's the output:
Processed 15 items, updated 15, no matches for 0
There were a handful of OCLC data loading errors on the first run that I'm not currently reporting; those were the 15 that it processed on the second one.)
It's not strictly part of this story, so I didn't add it to the testing checklist, but feel free to try: you can go to the list of genres or the list of subjects and remove one that you don't want - removing it will remove it from any books it was associated with, without having to edit all those books individually.
@clmahoney I'm not sure that it's possible to completely fix the non-english results or completely exclude ebooks. Can you tell if the results are better?
Notes for testing
additional testing May 21
dev notes