Closed baskaufs closed 10 months ago
This has been partially addressed in the work_flow.ipynb script, in the Process 3D works to determine type
section. I believe that all of the works categorized as "work of art" are 3D works, so their types can be broken out through the label-processing done here. Mary Anne C. wants to align the eventual classes with standards for art museums. The text descriptions pulled out here as object descriptions need to be matched with Wikidata items serving as classes. Danni Huang is willing to work on this and she would need to do some research on the subclassing as it currently exists in Wikidata and also see what the best practices are in the Wikidata arts community. The Sum of All Paintings group has some guidelines, but I don't think they extend much to 3D, so some exploration will be required.
Email from Mary Anne C on 2021-11-02:
We also need whatever is developed to work within museum standard cataloging language which is not necessarily like Dublin Core. (Getty Thesuarus is our internal database and Artstore, while Chenauall’s Nomenclature is often used by history museums.)
Link to hierarchy diagram at https://github.com/HeardLibrary/vandycite/issues/50#issuecomment-1022657853 @MaryAnneC should take a look
The Getty Arts and Architecture Thesaurus has a built-in hierarchy. Here's an example for ceremonial mask.
I think that @MaryAnneC said this is what the Gallery uses in the internal database and ArtStor.
We should also download Chenauall’s Nomenclature. It's available in Excel and JSON-LD (script-readable).
The Sum of All Paintings people have a great list of artwork types that we should also look at.
@baskaufs should try to load the Chenauall’s Nomenclature JSON-LD into Fuseki to see if he can do a federated query linking AAT, Wikidata, and the C.N.
Was able to load it with no problem and ran the following federated query:
PREFIX rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#>
PREFIX rdfs: <http://www.w3.org/2000/01/rdf-schema#>
prefix skos: <http://www.w3.org/2004/02/skos/core#>
prefix wd: <http://www.wikidata.org/entity/>
prefix wdt: <http://www.wikidata.org/prop/direct/>
SELECT distinct ?class ?classLabel ?concept ?label WHERE {
SERVICE <https://query.wikidata.org/sparql>
{
?work wdt:P195 wd:Q18563658.
?work wdt:P31 ?class.
?class rdfs:label ?classLabel.
filter(lang(?classLabel) = "en")
}
OPTIONAL
{
?concept skos:exactMatch ?class.
?concept skos:prefLabel ?label.
filter(lang(?label) = "en")
}
}
Many of the Wikdiata classes had matches with similar labels.
Met with Danni 2022-02-25 and looked at some Python scripts. The action item was to decide what authority/thesaurus to use and then try to match the object_description
field in the `3d_parts.csv' spreadsheet using fuzzy matching. So that CSV needs to be loadable from the web, then set up a Colab notebook to do the matching and possibly parts of speech tagging.
@MaryAnneC is going to get a report from Collector Systems with the accession number and "object type" field for us.
Loaded entire AAT linked data into new Neptune database along with Nomenclature for Museum Cataloging, so it can now be queried for crosslinks. Need to get a dump of the Wikidata metadata to provide labels and superclassing.
Created a script that uses NLTK on the object_description
field of the 3d_parts.csv
spreadsheet to pull out the main descriptive nouns. Need to try to match them to labels in Nomenclature, Getty, and Wikidata, then see how those concepts are related.
Created notebook to search Neptune for exact label matches in Nomenclature and Getty.
Action items:
Improve P31 values
Document hierarchies for categories of gallery 3D items
Modified nltk_on_3d.ipynb script to output descriptive noun list in https://github.com/HeardLibrary/vandycite/commit/d27e41be72ffbac41b5fd9701a4e5f14b3a54ceb
Used script query_thesauri_for_descriptive_nouns.ipynb to create Thesaurus crosswalk table thesauri_ids.csv in https://github.com/HeardLibrary/vandycite/commit/ca96363a9a2df7e95ca397165f2ad889d8f177aa
Completed this with Chuck's project and reclassification of posters fall 2023
Extracted from #3
Correcting and making more specific the InstanceOf values for the pieces. In particular, "work of art" needs to be refined. "image" should be photograph, print, painting, etc. Look at class hierarchy for classes used. Do the most specific ones (ones with a single instance) have parent classes that include the broader classes with many instances (like "painting")? Does "print" (the largest category) actually have narrower subclasses that would be more specific?