davanstrien / IIIF-ML-experiments

1 stars 1 forks source link

List of cultural subject heading options #9

Open glenrobson opened 3 years ago

glenrobson commented 3 years ago

A issue to note different subject heading standards we could link the Coco terms to to make them useful.

glenrobson commented 3 years ago

LCTGM (Library of Congress Thesaurus for Graphic Materials) was one I used to work with at the NLW. You can browse and search the headings at the link below, there are over 7,000:

https://www.loc.gov/pictures/collection/tgm/

Here is an example for a person:

https://www.loc.gov/pictures/collection/tgm/item/tgm007607/

It would be interesting to find out how well this is used in other places and if its still used or if this is a more historical standard. I'm assuming this doesn't have much use in museums or fields outside of libraries.

Interestingly its a property on wikidata so potentially this is another source of training data:

https://www.wikidata.org/wiki/Property:P5160

Although only 692 of the 7,000 terms are in use on wikidata:

https://query.wikidata.org/#select%20%28count%28distinct%20%3Fo%29%20as%20%3Fcount%29%20%7B%0A%20%20%3Fs%20wdt%3AP5160%20%3Fo%20.%0A%7D%20%0AORDER%20BY%20DESC%28%3Fo%29

and there are only 694 images with LCTGM headings so not a great set....

It looks like the BL also use this but its coded as gmgpc which was a previous version of the thesaurus which was merged into tgm in 2007.

http://primocat.bl.uk/F/?func=direct&local_base=PRIMO&doc_number=004869894&format=001&con_lng=eng

There is quite an extensive API for the LOC pictures collection and it should be possible to scrape images and subject headings:

http://www.loc.gov/pictures/api

glenrobson commented 3 years ago

Another more modern one we were starting to use at NLW was FAST subject headings:

https://www.oclc.org/research/areas/data-science/fast.html

but this looks massive! 1.8 million headings but they are split into Personal names, Corporate names, Meeting names, Geographic names, Events, Titles, Time periods, Topics, and Form/Genre. So Topics could be smaller. You can search the headings here:

https://fast.oclc.org/searchfast/

This might be the equivalent of persons in FAST:

https://fast.oclc.org/searchfast/?&limit=altphrase&facet=all&query=Persons&sort=usage+desc&start=0#&single=fst01058861&fullview=simple&sep=click

It looks like this property is also in Wikidata:

https://www.wikidata.org/wiki/Property:P2163