HeardLibrary / vandycite

0 stars 0 forks source link

Step 3: identify Commons works linked to non-artwork items that don't have Wikidata artwork items and create the artwork items #9

Closed baskaufs closed 2 years ago

baskaufs commented 3 years ago

The string values should be entered in the ACT data dump CSV file in the ObjectFunction column and the script will assign the Q IDs. Let Steve know if there should be new types added to this list. Charlotte will do

baskaufs commented 3 years ago

Complications:

  1. Many of the cases will have minimal data from Commons. For example, this. Issues: who is actually the artist? What recognition is necessary for the Commons user Judgefloro
  2. Here is a case where the artwork is in a gallery (BYU), but the gallery hasn't made the metadata freely available: commons page. See this from the gallery and this from some company (associated with the gallery?). In this case, how much metadata from ACT can be used to make up for the lack of information from the gallery?
baskaufs commented 3 years ago

This query should just give us the works that we need to create Wikidata artwork items for (then delete the ACT ID from the non-artwork hits):

# Since all cleanup has been done where a work has a Wikidata item that's an artwork
# this query should just be the works that need to have artwork items created.
select distinct ?work1 ?actId1 ?work1Label ?class1Label where {
  ?work1 wdt:P9092 ?actId1.
  ?work1 wdt:P18 ?commonsImage.
  optional {?work1 wdt:P31 ?class1.}

  minus {?work1 wdt:P31 wd:Q3305213.} # painting
  minus {?work1 wdt:P31 wd:Q15711026.} # altarpiece  
  minus {?work1 wdt:P31 wd:Q93184.} # drawing
  minus {?work1 wdt:P31 wd:Q22669139.} # fresco
  minus {?work1 wdt:P31 wd:Q15123870.} # lithograph
  minus {?work1 wdt:P31 wd:Q8362.} # miniature
  minus {?work1 wdt:P31 wd:Q133067.} # mosaic
  minus {?work1 wdt:P31 wd:Q219423.} # mural
  minus {?work1 wdt:P31 wd:Q125191.} # photograph
  minus {?work1 wdt:P31 wd:Q11060274.} # print
  minus {?work1 wdt:P31 wd:Q1064538.} # quilt
  minus {?work1 wdt:P31 wd:Q245117.} # relief sculpture
  minus {?work1 wdt:P31 wd:Q860861.} # sculpture
  minus {?work1 wdt:P31 wd:Q2282251.} # seven-branched candlestick
  minus {?work1 wdt:P31 wd:Q1473346.} # stained glass
  minus {?work1 wdt:P31 wd:Q179700.} # statue
  minus {?work1 wdt:P31 wd:Q18761202.} # watercolor painting
  minus {?work1 wdt:P31 wd:Q48498.} # illuminated manuscript
  minus {?work1 wdt:P31 wd:Q184296.} # tapestry

  SERVICE wikibase:label { bd:serviceParam wikibase:language "[AUTO_LANGUAGE],en". } # Helps get the label in your language, if not, then en language
 }
order by ?work1Label
baskaufs commented 3 years ago

Used the "Script to pull data for works needing processing" section of the compare_metadata_sources.ipynb script to generate files act_data_fix.csv and commons_data_fix.csv that we can use to create the items for this phase.

baskaufs commented 2 years ago

Created a table of candidate properties to use here

baskaufs commented 2 years ago

Fixed script to output potential artist matches. (task 4)

baskaufs commented 2 years ago

Deciding on what is the primary artwork is still a bit confusing. For example, https://diglib.library.vanderbilt.edu//act-imagelink.pl?RC=55547 is a house. Is the house an artwork and the photo another artistic representation of it? I suppose the architect (if there was one) would be the artist. In that case the type of the artwork should be "architectural structure" and not photograph.

In contrast, if it's a photo of a person or natural feature, the photo is the artwork since no human designed the depicted object. In that case the type of the artwork would be "photograph".

The key distinction seems to be whether the thing depicted in the photograph was designed by a person or not.

baskaufs commented 2 years ago

Particularly for architectural works, the current script frequently puts the photographer as the artist rather than the architect. I had to check many of these manually. There were also some "photographs" that needed to be re-classified as architectural objects. It may be that there would be a use for using some kind of image recognition on the images to see if they were buildings or not, if it was capable of recognizing them.

baskaufs commented 2 years ago

It was actually pretty rare for items to be correctly identified as "photograph" rather than an underlying artwork, since relatively few images were of humans or natural features. So probably all items that the script designates as "photograph" need to be checked for incorrect typing.

baskaufs commented 2 years ago

VanderBot now supports novalue anonymous artists. See #21