HeardLibrary / vandycite

0 stars 0 forks source link

Investigate whether there's a way to pull artwork data down from Wikidata to disambiguate #69

Closed baskaufs closed 2 years ago

baskaufs commented 2 years ago

I'm concerned about creating duplicate items. See if it's reasonable to see how many items are in Wikidata that are some kind of artwork and maybe download them all if it isn't too many to try to do fuzzy string matching.

baskaufs commented 2 years ago

It actually wasn't at all hard to do this. Used the query:

select distinct ?item ?label where {
  ?item wdt:P31 ?class.
{?item wdt:P31 wd:Q15711026.} # altarpiece  
  union {?item wdt:P31 wd:Q93184.} # drawing
  union {?item wdt:P31 wd:Q22669139.} # fresco
  union {?item wdt:P31 wd:Q15123870.} # lithograph
  union {?item wdt:P31 wd:Q8362.} # miniature
  union {?item wdt:P31 wd:Q133067.} # mosaic
  union {?item wdt:P31 wd:Q219423.} # mural
  union {?item wdt:P31 wd:Q125191.} # photograph
  union {?item wdt:P31 wd:Q11060274.} # print
  union {?item wdt:P31 wd:Q1064538.} # quilt
  union {?item wdt:P31 wd:Q245117.} # relief sculpture
  union {?item wdt:P31 wd:Q860861.} # sculpture
  union {?item wdt:P31 wd:Q2282251.} # seven-branched candlestick
  union {?item wdt:P31 wd:Q1473346.} # stained glass
  union {?item wdt:P31 wd:Q179700.} # statue
  union {?item wdt:P31 wd:Q18761202.} # watercolor painting
  union {?item wdt:P31 wd:Q48498.} # illuminated manuscript
  union {?item wdt:P31 wd:Q184296.} # tapestry
  union {?item wdt:P31 wd:Q811979.} # architectural structure
  union {?item wdt:P31 wd:Q1278452.} # polyptych
  union {?item wdt:P31 wd:Q15727816.} # painting series
  union {?item wdt:P31 wd:Q16744570.} # tablet
  union {?item wdt:P31 wd:Q87167.} # manuscript
  union {?item wdt:P31 wd:Q16905563.} # cycle of paintings
  union {?item wdt:P31 wd:Q132137.} # icon
  union {?item wdt:P31 wd:Q18887969.} # copper engraving print
  union {?item wdt:P31 wd:Q28823.} # textile
  union {?item wdt:P31 wd:Q2293362.} # sculptural group
  ?item wdt:P18 ?image.
  ?item rdfs:label ?label.
  filter(lang(?label)='en')
  }

and a variation with only paintings to download the English labels into two CSVs: wikidata_paintings.csv (13 MB, 174547 items) and wikidata_other_artwork_types.csv (13 MB, 182338 items) in the OneDrive folder wikidata_data, act.

baskaufs commented 2 years ago

Started script github/vandycite/act/create_items/check_all_artworks.ipynb to do checking but didn't get very far.

baskaufs commented 2 years ago

Completed work on script and ran it (took about a day to run). The result is at https://github.com/HeardLibrary/vandycite/blob/master/act/create_items/artwork_matches.csv