HeardLibrary / vandycite

0 stars 0 forks source link

Perform additional test to prevent creating duplicate items #18

Closed baskaufs closed 2 years ago

baskaufs commented 2 years ago

For whatever reason, there seem to be some abstract artwork items that I missed when scraping that little Wikidata button on the Commons page. For example, my personal favorite from the Uffizi.

We should manually check any from the Louvre, Uffizi, or other famous galleries because they are likely to already be in Wikidata. I need to check the script to figure out what was causing them to be missed.

There may be alternate means to search for them, such as downloading all works by an artist or in a gallery and doing fuzzy string matching against our title.

baskaufs commented 2 years ago

I think that I've successfully in fixed this in https://github.com/HeardLibrary/linked-data/commit/77052c1dd0e761c58f8ba7e4395134f6255e1cfb Some elements were inconsistently placed within the table, so I think I've nailed down the screening to just get the first row of the image metadata table, then limit the links to only those to Wikidata items (designated with "wiki:" in their title)

The results are in this table. We'll need to manually change the links over from the incorrect items to the ones identified here.

I think the is now reliable enough that we don't need to go to extreme measures to look for more.