E-RIHS / schema

https://e-rihs.github.io/schema
1 stars 0 forks source link

Need a materials list #16

Closed jpadfield closed 5 months ago

jpadfield commented 6 months ago

There is a list of material types in the old IPERION-HS Database these need to be pulled across

jpadfield commented 6 months ago

@kobbejager The IPERION-HS list covers 150 or so terms, they relate to AAT but most of them do not actually match, if one considers all of the terms under "materials (substances)" - http://vocab.getty.edu/aat/300010358 them we will have a few thousand terms - given we will be expanding the services being offered does it make sense to not second guess this and just let people have an autocomplete of all of the aat materials or will this be too slow?

I can add all of the required terms to OpenTheso2 but I was not sure how a "dropdown" of that size would work in Cordra

jpadfield commented 6 months ago

Looking at the iperion-hs list of "Materials" a few of them are not materials, for example "paintings" - Do we need a separate list of object types a service is targeted towards or has experience with separate to the one focused on "materials" ?

We can get round this a little to just get people to select "painting materials" instead but this is a very unhelpful and vague term, and I am not sure ones like "sculpture materials", etc would work either

jpadfield commented 6 months ago

Matched IPERION-HS Materials

IPERION-HS Material Term AAT Term AAT ID
alloys alloy 300010902
amber amber (fossil resin) 300012934
animal bones bone (material) 300011798
animal fibres animal fiber 300386673
animal gut gut (animal material) 300193289
animal parts animal components 300251797
base metals base metal 300241615
binding media media (artists' materials) 300163343
bio-materials biological material 300265629
bone bone (material) 300011798
canvas canvas (textile material) 300014078
carbonate rich rocks carbonate rock 300011283
cellulosic materials cellulosic 300014437
cement cement (construction material) 300010362
ceramic (clay, mud brick, terracotta, earthenware, stoneware, porcelain) ceramic (material) 300235507
ceramic (clay/mud brick/terracotta/earthenware/stoneware/porcelain) ceramic (material) 300235507
ceramic (terracotta/earthenware/stoneware/porcelain) ceramic (material) 300235507
ceramics ceramic (material) 300235507
charcoal charcoal (material) 300012862
composite composite material 300014627
concrete concrete 300010737
consolidants consolidant 300379405
copper copper (metal) 300011020
copper-based alloys copper alloy 300010942
corrosion corrosion (residue material) 300258515
crystal crystal (material by form) 300221157
dyes dye 300013029
earth earth (soil) 300011734
feldspars feldspar 300011087
flint flint (rock) 300011143
gemstone gemstone 300201964
glass glass (material) 300010797
glue glue 300014815
granite granite (rock) 300011183
human bones human bone 300451454
human remains human remains 300379896
ink ink 300015012
inorganic inorganic material 300010360
inorganic compounds inorganic material 300010360
lead lead (metal) 300011022
leather leather 300011845
metal metal 300010900
metal and metallurgical By-Products 300011055
mineral mineral 300011068
modern material modern materials 300379525
mordant mordant (surface preparation material) 300015299
mortar mortar (filler) 300014741
mosses moss (plant material) 300011915
obsidians obsidian 300011254
ocher ocher (inorganic material) 300013951
ocher ocher (pigment) 300152219
ore ore 300152583
organic organic material 300011792
organic pigments organic pigment 300013120
paint paint (coating) 300015029
paper paper (fiber product) 300014109
parchment parchment (animal material) 300011851
pigment pigments 300013109
plant fibres plant fiber 300014031
plant parts plant components 300375171
plaster plaster (composite coating) 300014922
plastic plastic (material) 300014570
poisons poison 300412103
pollen pollen 300213002
polymers polymers 300218300
precious metal precious metal 300011054
precious stone precious stone (material) 300133732
quartz quartz (mineral) 300011132
raw material raw material 300015351
renders render (coating) 300379688
resin resin (organic material) 300012882
rock rock (inorganic material) 300011692
sediment sediment 300379424
seeds seed (material) 300011902
shell shell (animal material) 300011829
silver silver (metal) 300011029
skeletal reamins skeletons (animal components) 300191778
skulls skulls (skeleton components) 300191856
slag slag 300011790
soil soil 300014330
speleothems speleothems 300380015
stone stone (worked rock) 300011176
synthetic organic pigments synthetic organic pigment 300013129
teeth tooth (material) 300011855
textiles textile materials 300231565
tin tin (metal) 300133748
varnishes varnish 300014974
water water (inorganic material) 300011772
wood wood (plant material) 300011914
jpadfield commented 6 months ago

Unmatched or Problematic IPERION-HS Materials

Some of these terms are not really covered by AAT, others are included here as they are not actually materials, such as the various heritage object types - Some of these could be converted to a "material", but others indicate the need for a separate drop-down indicating the type of objects a given service is intended to work with or has experience working with.

IPERION-HS Material Term AAT Term AAT ID
animal teeth tooth (material) 300011855
archaeological alloys alloy 300010902
archaeological alloys archaeological objects 300234110
archaeological metals archaeological objects 300234110
archaeological metals metal 300010900
carbonised organic remains carbonization 300379618
carbonised organic remains living organisms' remains 300375170
charred tubers carbonization 300379618
charred tubers tuber (plant material) 300251432
corals corals (animals) 300250925
corrosion inhibitors    
corrosion patina corrosion (residue material) 300258515
crop    
frescoes frescoes (paintings) 300177433
herbarium herbaria (display rooms) 300005814
human teeth tooth (material) 300011855
jewels jewels 300439889
lakes lake (pigment) 300014015
lithic industry    
majolica maiolica 300021170
microbiology (algae/bactery/fungi) microbiology 300054469
mural painting mural paintings (visual works) 300033644
mural paintings mural paintings (visual works) 300033644
nanoparticles nanoparticles 300438652
native gold gold (metal) 300011021
native gold natural resources 300067987
organic coatings coating (material) 300014907
organic coatings organic 300191632
organic coatings organic compounds 300379624
organic dyes dye 300013029
organic dyes organic 300191632
organic dyes organic compounds 300379624
organic films film (material by form) 300014637
organic films organic 300191632
organic films organic compounds 300379624
paint materials    
Particles (clay, metal) in aqueous solutions    
patinas patina (condition) 300065245
PGM inclusions in precious metals    
plated objects    
pottery pottery (visual works) 300010666
salts    
sediments rich in quartz    
silex    
surface patina patina (condition) 300065245
synthetic polymers polymer paint 300015055
vegetation vegetation 300266061
wall painting mural paintings (visual works) 300033644
jpadfield commented 6 months ago

Having gone through all of these, and given that I have made a few assumptions to map all of the terms I think exploiting the full (with perhaps a few group terms hidden) AAT materials list might be better.

But as noted above I think we need to add an extra field with a list of object types. Perhaps, if we go for a big list, some content from: Visual and Verbal Communication (hierarchy name) - https://www.getty.edu/vow/AATHierarchy?find=&logic=AND&note=&subjectid=300264552

kobbejager commented 5 months ago

I also have the feeling that we should opt for the full AAT materials list as much as possible. If necessary, we could indeed leave out (some of) the group terms. Since we have to start from scratch, I wouldn't care too much with what we had under IPERION HS.

How do we do this? Should we reference directly to the AAT term (pid), or do we make a copy of the terms in vocab.e-rihs.io? The former solution would require adding a second source to your ecls script. The latter is probably easier and more durable?

Not sure what to do with object types. Possibly in another field, if the same information cannot be captured with materials. Certainly not mixing them with the materials list. Should we ask Laura?

jpadfield commented 5 months ago

I think indicating that a service specialises in examining particular types or classes of objects would be just as useful as indicating which materials they work on. Yes I think it would be good to talk to Laura - should I do this?

I think we should be consistent about the source of our controlled lists so adding them to opentheso would seem sensible - there is the Editorial issue of adding new terms as the AAT develops ... however it also gives us the flexibility to add extra terms if we need them, and then we might send these back to AAT. Adding then to Opentheso would then make it easy for us to remove terms, mainly the AAT uses for their hierarchy that we might not need in a materials list.

However I can relatively easily create a stub collection in Opentheso - but when that is requested via the ecls script it just returns the full aat material list.

jpadfield commented 5 months ago

We could use a variation of the following python (though this one only pulls English descriptions) to pull the terms from AAT, cache them locally, for general use but have the option to refresh as required:

#!/usr/bin/python3

import requests
import json

# Starting AAT term URI
aat_uri = "300010358"
#materials (substances) (Materials (hierarchy name))
#Note: The matter or substance from which a thing is or may be made; the tangible substance that goes into the makeup of an art work or other physical object. Physical substances, either naturally or synthetically derived, range from specific materials to types of material. Materials may be designated by their properties, or by function or form. Included are raw materials and processed materials.

# Base URL for the Getty AAT API
base_url = "http://vocab.getty.edu/aat/"
aat = {}
tno = 0
cno = 0

def extract_content(data):
    contents = []

    # Assuming data is the structure containing the array of interest
    subject_of = data.get("subject_of", [])  # Replace "subject_of" with the actual key if it's different

    # If subject_of itself is not directly the array but within another structure, adjust the access accordingly

    for item in subject_of:
        # Check if 'language' specifies English; adjust according to your structure
        if "language" in item and any(lang.get("_label") == "en" for lang in item.get("language", [])):
            content = item.get("content")
            if content:  # Ensure content is not None or empty
                contents.append(content)

    # Join the contents list into a single string separated by spaces
    concatenated_contents = ' '.join(contents)
    return concatenated_contents

def get_term_details(uri):
    # Build the full URL
    url = f"{base_url}{uri}.json"
    try:
        response = requests.get(url)
        response.raise_for_status()  # Raise an error for bad responses

        # Parse the JSON response
        data = response.json()

        # Extract details
        preferred_label = data.get('_label')
        descriptive_note = extract_content(data)
        uri = data.get('id')
        broader_terms = data.get('broader')
        narrower_terms = data.get('narrower')

        if broader_terms is None:
          broader_terms = []

        if narrower_terms is None:
          narrower_terms = []

        term_details = {
            'preferred_label': preferred_label,
            'descriptive_note': descriptive_note,
            'uri': uri,
            'broader_terms': broader_terms,
            'narrower_terms': narrower_terms,
        }

        return term_details

    except requests.RequestException as e:
        print(f"Error retrieving term details for {uri}: {e}")
        return None

def explore_hierarchy(uri, level=0):
    global tno
    global cno
    aat_code = uri.split('/')[-1]

    if aat_code not in aat: 
      details = get_term_details(aat_code)  # Extract the ID and get details
      cno = cno + 1;
      tno = tno + 1;
      if cno > 99:        
        cno = 0
        print("aat term["+str(tno)+"]: " + details['preferred_label'])        

      if details:
        aat[aat_code] = {
          "skos:prefLabel@en": details['preferred_label'],
          "skos:definition@en": details['descriptive_note'],
          "skos:broader": details['broader_terms'],
          "skos:narrower": details['narrower_terms'],
          "skos:exactMatch": details['uri']
          }

        #Recursive call for each narrower term
        for narrower_term in details['narrower_terms']:
          narrower_uri = narrower_term.get('id')
          explore_hierarchy(narrower_uri, level+1)

# Start exploring from the provided AAT URI
explore_hierarchy(aat_uri)
with open("aat_materials.json", "w") as file:
    json.dump(aat, file, indent=4)
kobbejager commented 5 months ago

done