HeardLibrary / vandycite

0 stars 0 forks source link

classification ("instance of" property) #23

Closed baskaufs closed 10 months ago

baskaufs commented 2 years ago

Extracted from #3

Correcting and making more specific the InstanceOf values for the pieces. In particular, "work of art" needs to be refined. "image" should be photograph, print, painting, etc. Look at class hierarchy for classes used. Do the most specific ones (ones with a single instance) have parent classes that include the broader classes with many instances (like "painting")? Does "print" (the largest category) actually have narrower subclasses that would be more specific?

baskaufs commented 2 years ago

This has been partially addressed in the work_flow.ipynb script, in the Process 3D works to determine type section. I believe that all of the works categorized as "work of art" are 3D works, so their types can be broken out through the label-processing done here. Mary Anne C. wants to align the eventual classes with standards for art museums. The text descriptions pulled out here as object descriptions need to be matched with Wikidata items serving as classes. Danni Huang is willing to work on this and she would need to do some research on the subclassing as it currently exists in Wikidata and also see what the best practices are in the Wikidata arts community. The Sum of All Paintings group has some guidelines, but I don't think they extend much to 3D, so some exploration will be required.

baskaufs commented 2 years ago

Email from Mary Anne C on 2021-11-02:

We also need whatever is developed to work within museum standard cataloging language which is not necessarily like Dublin Core. (Getty Thesuarus is our internal database and Artstore, while Chenauall’s Nomenclature is often used by history museums.)

dannihuang830 commented 2 years ago

Link to hierarchy diagram at https://github.com/HeardLibrary/vandycite/issues/50#issuecomment-1022657853 @MaryAnneC should take a look

baskaufs commented 2 years ago

The Getty Arts and Architecture Thesaurus has a built-in hierarchy. Here's an example for ceremonial mask.

I think that @MaryAnneC said this is what the Gallery uses in the internal database and ArtStor.

We should also download Chenauall’s Nomenclature. It's available in Excel and JSON-LD (script-readable).

baskaufs commented 2 years ago

The Sum of All Paintings people have a great list of artwork types that we should also look at.

baskaufs commented 2 years ago

@baskaufs should try to load the Chenauall’s Nomenclature JSON-LD into Fuseki to see if he can do a federated query linking AAT, Wikidata, and the C.N.

baskaufs commented 2 years ago

Was able to load it with no problem and ran the following federated query:

PREFIX rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#>
PREFIX rdfs: <http://www.w3.org/2000/01/rdf-schema#>
prefix skos: <http://www.w3.org/2004/02/skos/core#>
prefix wd: <http://www.wikidata.org/entity/>
prefix wdt: <http://www.wikidata.org/prop/direct/>
SELECT distinct ?class ?classLabel ?concept ?label WHERE {
SERVICE <https://query.wikidata.org/sparql>
  {
    ?work wdt:P195 wd:Q18563658.
    ?work wdt:P31 ?class.
    ?class rdfs:label ?classLabel.
    filter(lang(?classLabel) = "en")
  }
OPTIONAL
  {
  ?concept skos:exactMatch ?class.
  ?concept skos:prefLabel ?label.
  filter(lang(?label) = "en")
  }
}

Many of the Wikdiata classes had matches with similar labels.

baskaufs commented 2 years ago

Met with Danni 2022-02-25 and looked at some Python scripts. The action item was to decide what authority/thesaurus to use and then try to match the object_description field in the `3d_parts.csv' spreadsheet using fuzzy matching. So that CSV needs to be loadable from the web, then set up a Colab notebook to do the matching and possibly parts of speech tagging.

baskaufs commented 2 years ago

@MaryAnneC is going to get a report from Collector Systems with the accession number and "object type" field for us.

baskaufs commented 2 years ago

Loaded entire AAT linked data into new Neptune database along with Nomenclature for Museum Cataloging, so it can now be queried for crosslinks. Need to get a dump of the Wikidata metadata to provide labels and superclassing.

baskaufs commented 2 years ago

Created a script that uses NLTK on the object_description field of the 3d_parts.csv spreadsheet to pull out the main descriptive nouns. Need to try to match them to labels in Nomenclature, Getty, and Wikidata, then see how those concepts are related.

baskaufs commented 2 years ago

Created notebook to search Neptune for exact label matches in Nomenclature and Getty.

baskaufs commented 2 years ago

Action items:

Improve P31 values

  1. Modify the script for matching descriptive nouns to Nomenclature concepts so that we can find the Wikidata Q IDs that are exactMatch. Create a CSV table with noun, Nomenclature ID, Wikidata Q ID. (Steve)
  2. Do quality control on the CSV (Danni)
  3. Create a CSV to use with VanderbBot to assign the Q IDs as P31 values for 3D works. (Steve and Danni)

Document hierarchies for categories of gallery 3D items

  1. Modify the script for matching descriptive nouns to Nomenclature concept to find equivalent Wikidata and Getty concepts (for 3D works in the gallery). (Steve)
  2. Find parent concepts for all three systems (Nomenclature, Wikidata, Getty). (Steve and Danni?)
  3. Use Gephi to make hierarchy tree diagrams of the three systems. (Get help from Shenmeng)
baskaufs commented 2 years ago

Modified nltk_on_3d.ipynb script to output descriptive noun list in https://github.com/HeardLibrary/vandycite/commit/d27e41be72ffbac41b5fd9701a4e5f14b3a54ceb

baskaufs commented 2 years ago

Used script query_thesauri_for_descriptive_nouns.ipynb to create Thesaurus crosswalk table thesauri_ids.csv in https://github.com/HeardLibrary/vandycite/commit/ca96363a9a2df7e95ca397165f2ad889d8f177aa

baskaufs commented 10 months ago

Completed this with Chuck's project and reclassification of posters fall 2023