ANZSoilData / def-au-domain

A domain-model for soils, in the form of a set of classes and properties that define the entities ('features' in GIS-speak) that comprise the important concepts in soils.
Creative Commons Zero v1.0 Universal
4 stars 0 forks source link

update observable properties list #30

Open meganrwong opened 1 year ago

meganrwong commented 1 year ago

For observable properties outside of soil and landscape classification and description (soil physiochem plus some biol and a few soil soil physical measurements like bulk density).

I've consolidated lists of observable properties relating to soil from ENVO/AGRO with list from FedUni Obsv service, plus some of my own soil biol ones (very common observations ) I noticed missing (like biological diversity, fungal to bacterial ratios).

It is in the excel mapping guide sheet for stakeholders atm as draft, but once/if we are happy with this as a list to start we can put in model.

@dr-shorthair An outstanding task I see of value is extracting sosa:ObservableProperty filtered by rdfs:subClassOf  glosis_lh:PhysioChemical - from https://github.com/rapw3k/glosis/blob/master/glosis_layer_horizon.ttl

Then, I can consolidate the current list we have (FedU Obsv service, ENVO/AGRO, random others) with GLOSIS as one list of observable properties. Most will fit under 'Soil Physiochem', and some 'Soil Biological' and 'Soil Physical'

meganrwong commented 1 year ago

This will be a starting list of observable properties on the soil layer to build from

meganrwong commented 1 year ago

btw Touched base w @amacleod-cerdi - we (Soils data team CeRDI) are happy that through this work we are revising their list of Obsv Prop because it will work toward fixing their problem that majority of identifiers don't resolve (were in LDR) or have placeholders

dr-shorthair commented 1 year ago

I'll record what I do:

  1. take a local cache of the GLOSIS ontology repository - https://github.com/rapw3k/glosis.git
  2. Add the folder to you TopBraid Project (drag it into the Project Explorer)
  3. Open glosis_layer_horizon.ttl
  4. Run this SPARQL:
    SELECT * 
    WHERE {
    ?o rdfs:subClassOf* glosis_lh:PhysioChemical ; rdfs:subClassOf ?os .
    ?os a owl:Restriction ; owl:onProperty sosa:observedProperty ; owl:hasValue ?op .
    }

    First you have to understand that the observed-properties of the sub-classes are fixed in owl:Restriction subclasses. The query WHERE clause works as follows:

    • find each sub-class (any depth) ?o of glosis_lh:PhysioChemical, and load its sub-class URIs into the variable ?os
    • test each ?os to see if it matches the pattern (i.e. filter) that it is (i) an owl:Restriction (ii) on the property sosa:observedProperty, and (iii) load the value of its owl:hasValue into the variable ?op

The query SELECT clause reports all the variables that satisfy these patterns.

However, I also see a lot of ObservableProperties in GLOSIS -

dr-shorthair commented 1 year ago

GLOSIS have done a solid job. They are using QUDT and SOSA (and OWL) in a sensible way.

meganrwong commented 1 year ago

Yeah, I think given our priorities atm we focus on properties in the PhysioChemical class. At quick glance I could not see any other grouping immediately helpful to us to plug our major gaps in Observable Properties list/s (Soil Chem, physics, biology). I think it will take some time to go through the concepts outside of PhysioChemical, as many seem to be variations on our ASLS and ASC Observable Properties.....we should map to these at some point though.....

dr-shorthair commented 1 year ago

Yes they are.

In the work I did yesterday I backed off from requiring that all the 'values' be encoded as URIs, and put in the ASLS and ASC codes instead. I thought that might make it easier for the providers, and it also makes it slightly easier to code the either (a code) or (a UCUM string) since these are both literals.

It would be easy to revert to the URIs. I think the FedUni work was very URI focussed. And GLOSIS has a nice OWL pattern for that which we can use.

However, that would lean us more towards making the numeric values a structure rather than a literal, using the glosis_lh:PhysioChemicalValue pattern instead of UCUM string - i.e.

[ 
    a qudt:QuantityValue ;
    qudt:numericValue "1.76"^^xsd:decimal ;
    qudt:unit unit:M .
]

vs

"1.76 m"^^cdt:ucum

This would probably make @RossDSearle happy. @abhritchie how do you think that would work in the SHACL/JSON-schema stage?

meganrwong commented 1 year ago

So is this what you are doing now - right @dr-shorthair ? You are probably getting this into ansis model .ttl to replace our current soil chem Observable Properties Then getting that to the front of the excel sheet for end-users We use this as our base list - I can go through and add any that are missing that I find in Agro/Envo, FedU. We can think about tagging as biol, phys, physio-chem (as per feedback to us) but the lines can get fuzzy using these groupings in my experience and may or may not work....

dr-shorthair commented 1 year ago

No - just chugging through ASLS Soil Profile chapter (again). Its not really until I start to model it that I see how it all hangs together.

meganrwong commented 1 year ago

No - just chugging through ASLS Soil Profile chapter (again). Its not really until I start to model it that I see how it all hangs together.

ok :) Note that I did the same in a way - check out my comments, questions, mapping in the 2nd excel sheet for data providers as you go

amacleod-cerdi commented 1 year ago

Yes they are.

In the work I did yesterday I backed off from requiring that all the 'values' be encoded as URIs, and put in the ASLS and ASC codes instead. I thought that might make it easier for the providers, and it also makes it slightly easier to code the either (a code) or (a UCUM string) since these are both literals.

It would be easy to revert to the URIs. I think the FedUni work was very URI focussed. And GLOSIS has a nice OWL pattern for that which we can use.

However, that would lean us more towards making the numeric values a structure rather than a literal, using the glosis_lh:PhysioChemicalValue pattern instead of UCUM string - i.e.

[ 
    a qudt:QuantityValue ;
    qudt:numericValue "1.76"^^xsd:decimal ;
    qudt:unit unit:M .
]

vs

"1.76 m"^^cdt:ucum

This would probably make @RossDSearle happy. @abhritchie how do you think that would work in the SHACL/JSON-schema stage?

VAS/FedUni is using exactly that qudt structure for hasResult, But also providing a hasSimpleResult which has a native integer/float/boolean value. From a JSON(-LD) API point of view, having native JSON datatypes for property values is helpful to, for instance filter by less-than greater-than etc. This is reflected in the draft
JSON-LD API Best Practices


      {
        "http://qudt.org/schema/qudt#numericValue": [
          {
            "@value": 95
          }
        ],
        "http://qudt.org/schema/qudt#unit": [
          {
            "@id": "http://registry.it.csiro.au/def/environment/unit/MilligramsPerKilogram",
            "http://www.w3.org/2000/01/rdf-schema#label": [
              {
                "@value": "MilliGrams per Kilogram"
              }
            ]
          }
        ]
      }
    ],
    "http://www.w3.org/ns/sosa/hasSimpleResult": [
      {
        "@value": 95
      }
    ]```
 ...and perhaps try to ignore the registry.it uri :-/
abhritchie commented 1 year ago

Not sure about SHACL, but in JSON Schema we'd have a set of definitions for QUDT classes. For example:

"$defs": {
    "quantityValue": {
        "title": "qudt:QuantityValue",
        "$comment": "A Quantity Value expresses the magnitude and kind of a quantity and is given by the product of a numerical value n and a unit of measure U. The number multiplying the unit is referred to as the numerical value of the quantity expressed in that unit. Refer to NIST SP 811 section 7 for more on quantity values. See https://qudt.org/schema/qudt/QuantityValue.1",
        "type": "object",
        "required": [
            "numericValue",
            "unit"
        ],
        "properties": {
            "numericValue": {
                "title": "qudt:numericValue",
                "$comment": "See https://qudt.org/schema/qudt/numericValue",
                "type": "number"
            },
            "unit": {
                "title": "qudt:unit",
                "$comment": "A reference to the unit of measure of a quantity (variable or constant) of interest. See https://qudt.org/schema/qudt/unit.",
                "type": "string"
            }
        }
    }
}

The above is quickly put together so the actual JSON Schema may look a bit different when I get back into schema mode.

The value:

{
    "depth": {
        "numericValue": 42,
        "unit": "m"
    }
}
dr-shorthair commented 1 year ago

But neither of you is using the QUDT URI to denote the unit:

Note that the UCM form "95 mg/kg"^^cdt:ucum or "42 m"^^cdt:ucum is defined here: https://w3id.org/cdt/ucum

meganrwong commented 1 year ago

Bruce replaced all the registry.it.csiro w e.g. http://qudt.org/vocab/unit/MilliGM-PER-KiloGM - we are prob due for a vocab table update! :)

dr-shorthair commented 1 year ago

Trying to properly align the observable properties and entities from the Soil Profile chapter of ASLS:

First: the profile is ‘merely’ a column (!) along which observations of the things-that-matter are made.

The things-that-matter are (i) material properties, always at selected depths within the column (ii) pedological properties, which are linked to horizons within the column

To get our model right, it would be helpful to understand what properties and entities are tied to the material at a specified depth, and what to horizons. Here's what I understand:

Have I got this right @abigg92 ?

dr-shorthair commented 1 year ago

Ping @abigg92

abigg92 commented 1 year ago

the only attributes that are recorded at a specified depth context are field measurements i.e field pH, electrical conductivity, slaking, dispersion etc. Not all measurements are listed in the current YB. All other attributes are recorded in the context of a horizon

dr-shorthair commented 1 year ago

OK! that's interesting. Obviously not quite what I had gleaned from the chapter, but I guess it makes sense since that whole Soil Profile chapter is oriented towards pedological observations. That really helps. It means that the samples (usually in-situ) on which all those pedological observations are made are always linked to a horizon within a profile. Good to clarify.

Can you confirm that the more generic physico-chemical measurements (bulk density, various chemistry, ...) are a separate thing, which are usually just on samples from specified depths?

bsimons14 commented 1 year ago

CSIRO turned off the QUDT Registry and QUDT2 used a different URI pattern to the one I implemented there so I manually mapped the QUDT URIs in registry.it.CSIRO.au to the new values at QUDT2. These have not yet been imported to the live VAS database.

Cheers Bruce Simons +61475954391 Grand Maître, Fêlés du Grand Colombier (#1681) Club des Cingles du Mont-Ventoux (#12000)

abhritchie commented 1 year ago

But neither of you is using the QUDT URI to denote the unit: [snip] @abhritchie has "m"

Note that the UCM form "95 mg/kg"^^cdt:ucum or "42 m"^^cdt:ucum is defined here: https://w3id.org/cdt/ucum

I assumed that the ANSIS JSON Schema would serialize the UCUM form but breaking it in two JSON keys. Retaining the benefit of a terse encoding of units using the UCUM rules but avoiding the presentation of quantities as strings that require regex (or similar to extract).

If URIs (linked data) are required, then ANSIS should use JSON-LD.

BTW, don't read anything into the UCUM URIs in the JSON Schema example above. They a quick hack to illustrate alignment - how we link somewhat different approaches is a matter to be discussed (new issue).

abigg92 commented 1 year ago

two extra thoughts regarding the depth thing.

  1. Some years ago, an argument was made to move the samples table from the horizon level up to the observation level. This contradicts the data schema. A physical sample (with a specified upper and lower depth), must by definition, belong to a horizon (within a profile). The reason for allowing samples out of context was to do with all the data collected by carbon researchers, agronomists etc, where they operate on fixed sampled intervals with no concept of horizons/profiles. We can't ignore that there are a lot of people that take samples that way, which is why the compromise was made, but it is the difference between what is 'correct' and what is realistic.
  2. Even a measurement at a specified depth e.g field pH at 60cm, belongs to a horizon, so you can view the measurement two ways - as either being a property at a specified depth, or being a property at a specified depth within a specified horizon. The latter is how it is set up in SITES, for logical reasons.
bsimons14 commented 1 year ago

A physical sample must sample a soil body. It may also fortuitously be a sample of an horizon. It may also be a sample of multiple horizons, such as what a profile samples, or when a specified non-pedologically determined depth range is used.

Cheers Bruce Simons +61475954391 Grand Maître, Fêlés du Grand Colombier (#1681) Club des Cingles du Mont-Ventoux (#12000)

dr-shorthair commented 1 year ago

Thanks @abigg92 My recent discussions with the team has highlighted the need to accommodate both the pedologists' data (i.e. what you call 'correct') and also the large amount of data from agronomists, provider-incentives etc. I understand the latter is typically tied to depth, not horizon, and is the bulk of what is in systems like Visualising Australia's Soils (VAS) from CeRDI, and can be useful in broad-scale investigations.

For example, an Observation of a physico-chemical property with some metadata

my:O2-dens-S1-in-P1
  a sosa:Observation ;
  skos:prefLabel "Measurement of bulk density of sample 1 from profile 1" ;
  sosa:hasFeatureOfInterest my:S1-in-P1 ;
  sosa:hasSimpleResult "1.4 g/cm3" ;
  sosa:observedProperty ansis:density-bulk ;
  sosa:resultTime "2022-09-14T11:35:00"^^xsd:dateTime ;
.

The 'has-feature-of-interest' tag indicates the entity that was actually observed – in this case, a sample from a specified depth range within Profile 1.

my:S1-in-P1
  a ansis:SoilSample ;
  skos:prefLabel "Sample 1 from depth range 25-30 cm in profile P1" ;
  sosa:isSampleOf my:Body-88 ;
  ansis:depth-lower "30 cm" ;
  ansis:depth-upper "25 cm" ;
  ansis:relatedProfile my:Profile_1 ;
  ansis:type "material" ;
.

Here the 'is-sample-of' is just 'the soil body' which could be as generic as 'Old McDonald's Farm upper paddock'.

If there was a pedologist involved, the description of the sample can indicate the horizon, like this

my:S1-A12-in-P1
  a ansis:SoilSample ;
  skos:prefLabel "Sample 1 from depth range 25-30 cm in horizon A12 in profile P1" ;
  sosa:isSampleOf my:Horizon_A12 ;
  ansis:depth-lower "30 cm" ;
  ansis:depth-upper "25 cm" ;
  ansis:relatedProfile my:Profile_1 ;
  ansis:type "material" ;
.

my:Horizon_A12 is another resource, the details of which are not shown here.

Other observations are made on 'samples' that are not material samples, but still have enough of the same structure that we can use the same 'schema' - for example:

my:O5-upperdepth-A12-in-P1
  a sosa:Observation ;
  skos:prefLabel "Measurement of upper depth of horizon A12 in profile P1" ;
  sosa:hasFeatureOfInterest my:A12-in-P1 ;
  sosa:hasSimpleResult "17 cm" ;
  sosa:madeBySensor <https://orcid.org/0000-0002-2991-2308> ;
  sosa:observedProperty ansis:depth-upper ;
  sosa:resultTime "2022-09-13T15:35:00"^^xsd:dateTime ;
  sosa:usedProcedure <https://example.org/soil/procedure/P723h> ;
.

In this case the 'feature-of-interest' is Horizon A12 within Profile 1. This is still described as a 'SoilSample' because the information we need to describe it is similar enough to material samples.

my:A12-in-P1
  a ansis:SoilSample ;
  skos:prefLabel "Layer A12 in Profile 1" ;
  sosa:isSampleOf my:Horizon_A12 ;
  ansis:relatedProfile my:Profile_1 ;
  ansis:type "in-place" ;
.

That is the working hypothesis at the conceptual level, and is what is described in the ANSIS Info Model 1.2 document. It is based on the existing O&M/SOSA schema which is also used by TERN and BDR.

The code examples are in https://github.com/ANZSoilData/def-au-domain/blob/main/rdf/soil-observation-examples.ttl (JSON in https://github.com/ANZSoilData/def-au-domain/blob/main/rdf/soil-observation-examples.jsonld )

dr-shorthair commented 1 year ago

In https://github.com/ANZSoilData/def-au-domain/blob/main/rdf/domain.ttl I have added schema:domainIncludes annotations on all the properties. A list or properties sorted by class (feature-type) can then be extracted using this SPARQL query:

SELECT ?f ?o ?o1
WHERE {
    ?o a sosa:ObservableProperty ; schema:domainIncludes ?f .
    OPTIONAL { ?o1 rdfs:subPropertyOf ?o . }
}
dr-shorthair commented 1 year ago

To generate a SKOS view of properties sorted by entity-type:

CONSTRUCT { 
    ?f a skos:Collection ; skos:prefLabel ?fl ; skos:member ?o .
    ?o a skos:Concept ; skos:prefLabel ?pl . }
WHERE {
    ?o schema:domainIncludes ?f ; a sosa:ObservableProperty ; skos:prefLabel ?pl .
    ?f skos:prefLabel ?fl . 
}
dr-shorthair commented 1 year ago

I've loaded this into RVA demo server - https://demo.vocabs.ardc.edu.au/viewById/966

@abhritchie @bsimons14 @GGrealish @ljgregory would be interested in your comments.