HeardLibrary / vandycite

0 stars 0 forks source link

Check works to be written against artwork titles #73

Closed baskaufs closed 2 years ago

baskaufs commented 2 years ago

To make sure we don't create duplicates download all artwork titles and do fuzzy matching. For potential matches, do fuzzy matching with the artists. If both match, pull to another list for manual checking and remove from the artworks to write.

baskaufs commented 2 years ago

Created script https://github.com/HeardLibrary/vandycite/blob/master/act/create_items/check_all_artworks.ipynb to do the check. The results are at https://github.com/HeardLibrary/vandycite/blob/master/act/create_items/artwork_matches.csv

baskaufs commented 2 years ago

Added extra cell to disambiguate_prior_to_phase_2b.ipynb in order to remove the potential duplicates in https://github.com/HeardLibrary/vandycite/commit/43f2cf7a2baf035ccd8cdb6803d6cccf40893429

Revised add_to_wikidata.csv