Open glenrobson opened 3 years ago
LCTGM (Library of Congress Thesaurus for Graphic Materials) was one I used to work with at the NLW. You can browse and search the headings at the link below, there are over 7,000:
https://www.loc.gov/pictures/collection/tgm/
Here is an example for a person:
https://www.loc.gov/pictures/collection/tgm/item/tgm007607/
It would be interesting to find out how well this is used in other places and if its still used or if this is a more historical standard. I'm assuming this doesn't have much use in museums or fields outside of libraries.
Interestingly its a property on wikidata so potentially this is another source of training data:
https://www.wikidata.org/wiki/Property:P5160
Although only 692 of the 7,000 terms are in use on wikidata:
and there are only 694 images with LCTGM headings so not a great set....
It looks like the BL also use this but its coded as gmgpc
which was a previous version of the thesaurus which was merged into tgm in 2007.
http://primocat.bl.uk/F/?func=direct&local_base=PRIMO&doc_number=004869894&format=001&con_lng=eng
There is quite an extensive API for the LOC pictures collection and it should be possible to scrape images and subject headings:
Another more modern one we were starting to use at NLW was FAST subject headings:
https://www.oclc.org/research/areas/data-science/fast.html
but this looks massive! 1.8 million headings but they are split into Personal names, Corporate names, Meeting names, Geographic names, Events, Titles, Time periods, Topics, and Form/Genre. So Topics could be smaller. You can search the headings here:
https://fast.oclc.org/searchfast/
This might be the equivalent of persons in FAST:
It looks like this property is also in Wikidata:
A issue to note different subject heading standards we could link the Coco terms to to make them useful.