HeardLibrary / vandycite

0 stars 0 forks source link

Check for Div School duplicates #66

Closed baskaufs closed 1 year ago

baskaufs commented 2 years ago

We should download all works from the Div School and then do fuzzy matching on them prior to doing any more uploads. Charlotte thinks there may be some duplicates already if the works didn't have DOIs in their Wikidata records or if their DOIs were assigned later.

baskaufs commented 2 years ago

This needs to be done before finishing #52

baskaufs commented 1 year ago

Didn't really need to do the fuzzy matching, just downloaded the data using https://github.com/HeardLibrary/linked-data/blob/master/vanderbot/acquire_wikidata_metadata.py , sorted, and visually scanned.