ESIPFed / science-on-schema.org

science-on-schema.org - providing guidance for publishing schema.org as JSON-LD for the sciences
Apache License 2.0
115 stars 33 forks source link

Which is the best way to represent ontological terms representing observation types of variableMeasured? #27

Closed kitchenprinzessin3880 closed 2 years ago

kitchenprinzessin3880 commented 5 years ago

The measuredVariables of a dataset specified by users/curators are unstructured and complex, e.g., DEPTH,sediment/rock In PANGAEA to enable meaningful descriptions of measured variables, we annotate them with relevant ontological terms. How can we represent ontological terms (annotations) of a measured variable through Schema.org? An example is given below, comments/suggestions are welcome.

{ "@context": "http://schema.org/", "@type": "Dataset", "variableMeasured": [ { "@type": "PropertyValue", "name": "DEPTH,sediment/rock", "subjectOf": { "@type": "DefinedTermSet", "hasDefinedTerm": [ { "@type": "DefinedTerm", "@id": "http://purl.obolibrary.org/obo/PATO_0001595", "name": "DEPTH" }, { "@type": "DefinedTerm", "@id": "http://purl.obolibrary.org/obo/ENVO_00002007", "name": "Sediment" }, { "@type": "DefinedTerm", "@id": "http://purl.obolibrary.org/obo/ENVO_00001995", "name": "Rock" } ] } } ] }

stale[bot] commented 4 years ago

This issue has been automatically marked as stale because it has not had recent activity.

smrgeoinfo commented 4 years ago

@kitchenprinzessin3880 I'm not sure I understand the property described in your example. Is it depth to sediment/rock interface, like 'depth to bedrock'? Something like 'soil depth'?

It seems to me that the propertyID on the PropertyValue is intended for this use, providing a registered identifier for a property. See https://github.com/ESIPFed/science-on-schema.org/issues/24. I would update @ashepherd 's example in https://github.com/ESIPFed/science-on-schema.org/blob/master/guides/Dataset.md#variables like this:

 "variableMeasured": [
    {
      "@type": "PropertyValue",
      "name": "latitude",
      "PropertyID": "gsn-quantity:latitude",
      "url": "https://www.sample-data-repository.org/dataset-parameter/665787",
      "description": "Latitude where water samples were collected; north is positive.",
      "unitText": "decimal degrees",
      "minValue": "45.0",
      "maxValue": "15.0"
    },

And your example:

"variableMeasured": [
    {
    "@type": "PropertyValue",
    "name": "DEPTH,sediment/rock",
    "propertyID":"http://purl.jp/bio/4/id/201006028017141570", 
    "description":"depth to interface between soil and underlying sediment or rock", 
    "unitCode":"MTR"
    }
mbjones commented 4 years ago

In general this looks good, and I like the use of PropertyValue, I think we could provide more guidance on how to consistently represent units. @mpsaloha, thoughts on this structure? See also the current guidance docs on this.

ashepherd commented 4 years ago

Monitor https://github.com/EnvironmentOntology/envo/issues/909 for using observed properties linked to ENVO in schema.org

kitchenprinzessin3880 commented 4 years ago

@smrgeoinfo A 'measured variable' is an unstructured text, may be composed of primarily observed property (or a physical quantity) and other concepts such as feature-of-interest, units, aggregate functions (e.g., average and maximum), method,device, location, and time. Examples of parameters are ‘Methane, daily formation rateper unit sediment mass’, ‘Practical salinity of the water body by CTD and computation using UNESCO 1983 algorithm’.

smrgeoinfo commented 4 years ago

I'd say a 'measured variable' a concept that can be described in unstructured text, or in some structured data object. In the examples above, couldn't the propertyID identify things ranging from 'temperature' (very generic), to the more specific phenomena/feature of interest/measurement procedure variable concepts like in your examples above.

dr-shorthair commented 4 years ago

sosa:hasFeatureOfInterest is more related to sdo:object.

FWIW I did a rough alignment of SSN/SOSA to schema.org a couple of years ago - see https://github.com/w3c/sdw/blob/gh-pages/ssn/rdf/sosa-sdo-mapping.ttl

Also a bit of a proposal for additions to SDO here: https://w3c.github.io/sdw/ssn/rdf/sdo-sosa-schema.rdfa.html

Unfinished business, but the proposed alignment might be of interest here.

njarboe commented 4 years ago

Here is an example how we are using it for contributions at the Magnetics Information Consortium (MagIC) data repository. These are from an automatic summary of the dataset. I see now that we should be able to easily add unitText values and, with a bit of work, unitCode values. The name and description values come straight from our controlled vocab lists.

"variableMeasured": [ { "@type": "PropertyValue", "name": "Direction MAD Free-Floating", "description": "Maximum Angular Deviation (MAD) of the free-floating directional PCA fits to the paleomagnetic vector", "minValue": 0.12366669, "maxValue": 26.6 }, { "@type": "PropertyValue", "name": "Latitude", "description": "Sample geographic location, Latitude", "minValue": 76.52167, "maxValue": 77.58917 }, { "@type": "PropertyValue", "name": "Inclination", "description": "Directions in specimen coordinates, Inclination", "minValue": -58.6, "maxValue": 89.95675754 }, { "@type": "PropertyValue", "name": "ARM Normalized Relative Paleointensity", "description": "Relative field strength estimated with NRM normalized by laboratory ARM", "minValue": 0.02678768, "maxValue": 1 }, { "@type": "PropertyValue", "name": "Susceptibility Normalized Relative Paleointensity", "description": "Relative field strength estimated with NRM normalized by susceptibility", "minValue": 0.030325174, "maxValue": 185.4786585 }, { "@type": "PropertyValue", "name": "Magnetization Volume", "description": "Measured intensity of magnetization, Volume normalized", "minValue": 0.0020875, "maxValue": 0.36771 }, { "@type": "PropertyValue", "name": "Susceptibility X Volume", "description": "Magnetic susceptibility, Volume normalized", "minValue": 0.000133748, "maxValue": 0.004453333 },

smrgeoinfo commented 4 years ago

So what are the outstanding issues here?

kaiiam commented 4 years ago

@ashepherd

Monitor EnvironmentOntology/envo#909 for using observed properties linked to ENVO in schema.org

This issue wasn't about schema.org per se, it was about where should OBO entity-quality pairings, e.g. concentration of chlorophyll a in liquid water be housed. We ended up deciding to make them in ENVO.

Regarding @kitchenprinzessin3880's issue, the following are relevant ontological concepts to more thoroughly represent the data. I'm showing examples from OBO/ENVO but other ontologies could also be used:

1) Entity: e.g., sediment or rock

2) Characteristic/Quality: e.g., depth of sediment (we can create entity quality pairings like this in ENVO using sediment and PATO:depth in the axiom).

3) Standard/Unit: e.g., meter

4) Perhaps also a measurement technique or protocol could also be relevant?

5) We might also want to add an additional environmental context concept to describe the contextual information about the environment where the sample was collected e.g., tundra biome and or mine

Finally, regarding encapsulation of these concepts within in schema.org, in a variableMeasured block, @mpsaloha is taking a pass at it, working off what @smrgeoinfo proposed above. He is trying to leverage @dr-shorthair's SSN/Schema.org mapping as to try and minimize the number of new schema.org terms we need to add. However, as he was explaining to me the schema.org terms are a bit sparse and it's a little tricky to make sure they are used correctly.

kitchenprinzessin3880 commented 4 years ago

@kaiiam agree, schema.org have limited properties to represent 'concepts' (e.g., feature, specimen) forming a parameter. so, based on the discussion above, i can see several options:

  1. use existing schema.org properties to specify the concepts correspond to a parameter, e.g., subjectOf, DefinedTermSet
  2. extend it with external vocabularies (e.g., ssn/sosa), e.g., multiple @context
mpsaloha commented 4 years ago

We probably want to conform with some Observations/Measurements model (SOSA, EQ, OBOÉ, etc), as many semantic efforts in the environmental sciences have converged on the utility of these for enhancing search and interpretation. Schema.org doesn't seem to yet have the appropriate Types or Properties for this purpose. I agree with @kitchenprinzessin3880 that the multiple @context might be the best way to proceed within the "variableMeasured" property. Kai and I hope to explore how multiple context references work in upcoming weeks. Ultimately "some" Obs/Meas model is needed, that, e.g. differentiates the entity (feature of interest) from the characteristic (quality, or observable property), and includes slots for Units, all referencing dereferenceable HTTP IRI's of terms RDF/OWL vocabularies. Probably initially we can do this through an external context reference, and through demonstrable use, get formally incorporated into the schema.org vocabulary.

dr-shorthair commented 4 years ago

@mpsaloha if you hadn't seen it, take a peek at https://github.com/w3c/sdw/blob/gh-pages/ssn/rdf/sosa-sdo-mapping.ttl

kitchenprinzessin3880 commented 4 years ago

@ashepherd - can we close the issue now as we now familiar with different mechanisms of representing parameters using schema.org?

ashepherd commented 4 years ago

@kitchenprinzessin3880 we are planning to close it as a part of our v1.2 release, and are planning to work out our updated recommendation for using variableMeasured at the ESIP Summer Meeting.

kitchenprinzessin3880 commented 4 years ago

@ashepherd

@kitchenprinzessin3880 we are planning to close it as a part of our v1.2 release, and are planning to work out our updated recommendation for using variableMeasured at the ESIP Summer Meeting.

noted ;)

mariutzica commented 4 years ago

@smrgeoinfo I'd like to provide the following documentation for SVO for the DiscussionVariableMeasured.md file, where I notice it says that SVO does not provide a vocabulary for its model. SVO does have a vocabulary which can be found in the lower ontology. More details below ...

Scientific Variables Ontology (SVO)

SVO has separate vocabularies for Variables, Properties and observed Phenomena.

Current entries are documented (with #) at the namespaces http://geoscienceontology.org/svo/svl/variable, http://geoscienceontology.org/svo/svl/property, and http://geoscienceontology.org/svo/svl/phenomenon, respectively.

Raw ttl and rdf files can be obtained with curl (instructions) or via browser at, e.g. http://geoscienceontology.org/svo/svl/variable/1.0.0/svo-lower-variable.ttl (or .rdf), http://geoscienceontology.org/svo/svl/variable/1.0.0/svo-lower-property.ttl, etc.

mbjones commented 4 years ago

Thanks @mariutzica !

smrgeoinfo commented 4 years ago

Check out draft proposals for updates to the Variables discussion in Dataset.md

Draft ADR

Discussion of problem, recommendations, and open issues

These are all DRAFTs for discussion, but try to present some concrete recommendations to evaluate.

smrgeoinfo commented 4 years ago

@mariutzica, those are good links to see more of SVO. For use in variableMeasured/PropertyValue/propertyID URI, what I think we need is a URI that will dereference at least to an html page explaining the variable, e.g. http://registry2.it.csiro.au/def/property/nitrate_concentration http://purl.obolibrary.org/obo/ENVO_3100022.

The SVO variables have URIs that are fragment identifiers in a big html page, e.g. http://www.geoscienceontology.org/svo/svl/property/1.0.0/#hydraulic_conductivity.

Ideally there should be a way to dereference the URI to get an rdf representation of the variable description.

mbjones commented 4 years ago

@smrgeoinfo @kaiiam @mpsaloha I just read over the draft proposed updates to the Variables section, and overall they look great. I did flag one thing, which was the intended usage for minValue and maxValue. Here's the current text:

minValue. If the value for the variable is numeric, this is the minimum value that occurs in the dataset. Not useful for other value types. maxValue. If the value for the variable is numeric, this is the maximum value that occurs in the dataset. Not useful for other value types.

That basically uses min and max to list the range of the values in the dataset. While I suppose that might be useful for discovery use cases, for data interpretation, I think an expression of the domain/bounds would be more useful. In that case, we'd be asking for the min and max allowable values for the variable, even if those values are not in the data set per se. This is useful for proper interpretation and quality assessment (e.g., do any observed values fall outside of the domain). For example, for a tree diameter, which is a length, the allowed bounds might be minValue > 0. I'm not sure if this fits in the spirit of the schema:minValue definition, which is quite general. Maybe the domain min and max would be in addition to the range expression? Also, some domain expressions would be hard to express with just a numeric value -- we need comparison operators too, such as '>= 0' as well as ways of indicating, e.g., number types (e.g., "positive integers" or "all real numbers -10.0 <=x <= 10.0"). Any thoughts on the utility of a domain expression versus a range expression?

smrgeoinfo commented 4 years ago

Great question. One the one hand, constraining valid values on the data type, on the other a description of the actual values in a data instance. I can imagine use cases for either kind of attribute. Needs some thought. Do we need to enable both?

njarboe commented 4 years ago

@kitchenprinzessin3880 In the title of this issue it should be "variableMeasured" not "measuredVariable". If you would change that it might prevent some confusion by others starting to read the thread.

njarboe commented 4 years ago

MagIC is using minValue and maxValue for the range of values in the dataset. (see example earlier in the thread). I think the scientists using our data repository would like to be able to search for datasets that have attributes that are in certain ranges. In our case allowable values are not very interesting or unconstrained. This could be different for others so I do feel being able to describe both the dataset range and the allowable range to be useful.

kitchenprinzessin3880 commented 4 years ago

@smrgeoinfo thanks for updating the title;)

smrgeoinfo commented 4 years ago

Note-- Ongoing discussion of this issue is now in a google doc to make editing by the workgroup easier. The current suggested recommendations are here. The discussion document includes a bunch of background material as well.

uschindler commented 3 years ago

@kitchenprinzessin3880 's idea was implemented for PANGAEA, see this example: https://doi.pangaea.de/10.1594/PANGAEA.770309?format=metadata_jsonld

image

smrgeoinfo commented 3 years ago

@uschindler sure seems that PropertyValue/about/DefinedTermSet would make more sense that subjectOf.

uschindler commented 3 years ago

Not sure how this should work: "about" is not a property of PropertyValue or Thing. From the schema.org description, this looks both fine as it depends on standpoint. We can discuss about that, a change is easy. All of that is just an XSLT from PANGAEA's native metadata schema.

smrgeoinfo commented 3 years ago

See https://github.com/ESIPFed/science-on-schema.org/tree/issue27-measuredVariable/examples/dataset for some example datasets and encoding approaches see https://github.com/ESIPFed/science-on-schema.org/tree/issue27-measuredVariable/guides for discussion of the issues in detail and draft recommendations see ESIP poster for overview

smrgeoinfo commented 3 years ago

Spawn some new more specific issues: #141, #142, #143, #144, please continue there