Closed kitchenprinzessin3880 closed 2 years ago
This issue has been automatically marked as stale because it has not had recent activity.
@kitchenprinzessin3880 I'm not sure I understand the property described in your example. Is it depth to sediment/rock interface, like 'depth to bedrock'? Something like 'soil depth'?
It seems to me that the propertyID on the PropertyValue is intended for this use, providing a registered identifier for a property. See https://github.com/ESIPFed/science-on-schema.org/issues/24. I would update @ashepherd 's example in https://github.com/ESIPFed/science-on-schema.org/blob/master/guides/Dataset.md#variables like this:
"variableMeasured": [
{
"@type": "PropertyValue",
"name": "latitude",
"PropertyID": "gsn-quantity:latitude",
"url": "https://www.sample-data-repository.org/dataset-parameter/665787",
"description": "Latitude where water samples were collected; north is positive.",
"unitText": "decimal degrees",
"minValue": "45.0",
"maxValue": "15.0"
},
And your example:
"variableMeasured": [
{
"@type": "PropertyValue",
"name": "DEPTH,sediment/rock",
"propertyID":"http://purl.jp/bio/4/id/201006028017141570",
"description":"depth to interface between soil and underlying sediment or rock",
"unitCode":"MTR"
}
In general this looks good, and I like the use of PropertyValue
, I think we could provide more guidance on how to consistently represent units. @mpsaloha, thoughts on this structure? See also the current guidance docs on this.
Monitor https://github.com/EnvironmentOntology/envo/issues/909 for using observed properties linked to ENVO in schema.org
@smrgeoinfo A 'measured variable' is an unstructured text, may be composed of primarily observed property (or a physical quantity) and other concepts such as feature-of-interest, units, aggregate functions (e.g., average and maximum), method,device, location, and time. Examples of parameters are ‘Methane, daily formation rateper unit sediment mass’, ‘Practical salinity of the water body by CTD and computation using UNESCO 1983 algorithm’.
I'd say a 'measured variable' a concept that can be described in unstructured text, or in some structured data object. In the examples above, couldn't the propertyID identify things ranging from 'temperature' (very generic), to the more specific phenomena/feature of interest/measurement procedure variable concepts like in your examples above.
sosa:hasFeatureOfInterest
is more related to sdo:object
.
FWIW I did a rough alignment of SSN/SOSA to schema.org a couple of years ago - see https://github.com/w3c/sdw/blob/gh-pages/ssn/rdf/sosa-sdo-mapping.ttl
Also a bit of a proposal for additions to SDO here: https://w3c.github.io/sdw/ssn/rdf/sdo-sosa-schema.rdfa.html
Unfinished business, but the proposed alignment might be of interest here.
Here is an example how we are using it for contributions at the Magnetics Information Consortium (MagIC) data repository. These are from an automatic summary of the dataset. I see now that we should be able to easily add unitText values and, with a bit of work, unitCode values. The name and description values come straight from our controlled vocab lists.
"variableMeasured": [ { "@type": "PropertyValue", "name": "Direction MAD Free-Floating", "description": "Maximum Angular Deviation (MAD) of the free-floating directional PCA fits to the paleomagnetic vector", "minValue": 0.12366669, "maxValue": 26.6 }, { "@type": "PropertyValue", "name": "Latitude", "description": "Sample geographic location, Latitude", "minValue": 76.52167, "maxValue": 77.58917 }, { "@type": "PropertyValue", "name": "Inclination", "description": "Directions in specimen coordinates, Inclination", "minValue": -58.6, "maxValue": 89.95675754 }, { "@type": "PropertyValue", "name": "ARM Normalized Relative Paleointensity", "description": "Relative field strength estimated with NRM normalized by laboratory ARM", "minValue": 0.02678768, "maxValue": 1 }, { "@type": "PropertyValue", "name": "Susceptibility Normalized Relative Paleointensity", "description": "Relative field strength estimated with NRM normalized by susceptibility", "minValue": 0.030325174, "maxValue": 185.4786585 }, { "@type": "PropertyValue", "name": "Magnetization Volume", "description": "Measured intensity of magnetization, Volume normalized", "minValue": 0.0020875, "maxValue": 0.36771 }, { "@type": "PropertyValue", "name": "Susceptibility X Volume", "description": "Magnetic susceptibility, Volume normalized", "minValue": 0.000133748, "maxValue": 0.004453333 },
So what are the outstanding issues here?
@ashepherd
Monitor EnvironmentOntology/envo#909 for using observed properties linked to ENVO in schema.org
This issue wasn't about schema.org per se, it was about where should OBO entity-quality pairings, e.g. concentration of chlorophyll a in liquid water be housed. We ended up deciding to make them in ENVO.
Regarding @kitchenprinzessin3880's issue, the following are relevant ontological concepts to more thoroughly represent the data. I'm showing examples from OBO/ENVO but other ontologies could also be used:
1) Entity: e.g., sediment or rock
2) Characteristic/Quality: e.g., depth of sediment
(we can create entity quality pairings like this in ENVO using sediment and PATO:depth in the axiom).
3) Standard/Unit: e.g., meter
4) Perhaps also a measurement technique or protocol could also be relevant?
5) We might also want to add an additional environmental context concept to describe the contextual information about the environment where the sample was collected e.g., tundra biome and or mine
Finally, regarding encapsulation of these concepts within in schema.org, in a variableMeasured
block, @mpsaloha is taking a pass at it, working off what @smrgeoinfo proposed above. He is trying to leverage @dr-shorthair's SSN/Schema.org mapping as to try and minimize the number of new schema.org terms we need to add. However, as he was explaining to me the schema.org terms are a bit sparse and it's a little tricky to make sure they are used correctly.
@kaiiam agree, schema.org have limited properties to represent 'concepts' (e.g., feature, specimen) forming a parameter. so, based on the discussion above, i can see several options:
We probably want to conform with some Observations/Measurements model (SOSA, EQ, OBOÉ, etc), as many semantic efforts in the environmental sciences have converged on the utility of these for enhancing search and interpretation. Schema.org doesn't seem to yet have the appropriate Types or Properties for this purpose. I agree with @kitchenprinzessin3880 that the multiple @context might be the best way to proceed within the "variableMeasured" property. Kai and I hope to explore how multiple context references work in upcoming weeks. Ultimately "some" Obs/Meas model is needed, that, e.g. differentiates the entity (feature of interest) from the characteristic (quality, or observable property), and includes slots for Units, all referencing dereferenceable HTTP IRI's of terms RDF/OWL vocabularies. Probably initially we can do this through an external context reference, and through demonstrable use, get formally incorporated into the schema.org vocabulary.
@mpsaloha if you hadn't seen it, take a peek at https://github.com/w3c/sdw/blob/gh-pages/ssn/rdf/sosa-sdo-mapping.ttl
@ashepherd - can we close the issue now as we now familiar with different mechanisms of representing parameters using schema.org?
@kitchenprinzessin3880 we are planning to close it as a part of our v1.2 release, and are planning to work out our updated recommendation for using variableMeasured
at the ESIP Summer Meeting.
@ashepherd
@kitchenprinzessin3880 we are planning to close it as a part of our v1.2 release, and are planning to work out our updated recommendation for using
variableMeasured
at the ESIP Summer Meeting.
noted ;)
@smrgeoinfo I'd like to provide the following documentation for SVO for the DiscussionVariableMeasured.md file, where I notice it says that SVO does not provide a vocabulary for its model. SVO does have a vocabulary which can be found in the lower ontology. More details below ...
Scientific Variables Ontology (SVO)
SVO has separate vocabularies for Variables, Properties and observed Phenomena.
Current entries are documented (with #) at the namespaces http://geoscienceontology.org/svo/svl/variable, http://geoscienceontology.org/svo/svl/property, and http://geoscienceontology.org/svo/svl/phenomenon, respectively.
Raw ttl and rdf files can be obtained with curl (instructions) or via browser at, e.g. http://geoscienceontology.org/svo/svl/variable/1.0.0/svo-lower-variable.ttl (or .rdf), http://geoscienceontology.org/svo/svl/variable/1.0.0/svo-lower-property.ttl, etc.
Thanks @mariutzica !
Check out draft proposals for updates to the Variables discussion in Dataset.md
Discussion of problem, recommendations, and open issues
These are all DRAFTs for discussion, but try to present some concrete recommendations to evaluate.
@mariutzica, those are good links to see more of SVO. For use in variableMeasured/PropertyValue/propertyID URI, what I think we need is a URI that will dereference at least to an html page explaining the variable, e.g. http://registry2.it.csiro.au/def/property/nitrate_concentration http://purl.obolibrary.org/obo/ENVO_3100022.
The SVO variables have URIs that are fragment identifiers in a big html page, e.g. http://www.geoscienceontology.org/svo/svl/property/1.0.0/#hydraulic_conductivity.
Ideally there should be a way to dereference the URI to get an rdf representation of the variable description.
@smrgeoinfo @kaiiam @mpsaloha I just read over the draft proposed updates to the Variables section, and overall they look great. I did flag one thing, which was the intended usage for minValue
and maxValue
. Here's the current text:
minValue. If the value for the variable is numeric, this is the minimum value that occurs in the dataset. Not useful for other value types. maxValue. If the value for the variable is numeric, this is the maximum value that occurs in the dataset. Not useful for other value types.
That basically uses min and max to list the range of the values in the dataset. While I suppose that might be useful for discovery use cases, for data interpretation, I think an expression of the domain/bounds would be more useful. In that case, we'd be asking for the min and max allowable values for the variable, even if those values are not in the data set per se. This is useful for proper interpretation and quality assessment (e.g., do any observed values fall outside of the domain). For example, for a tree diameter, which is a length, the allowed bounds might be minValue > 0
. I'm not sure if this fits in the spirit of the schema:minValue
definition, which is quite general. Maybe the domain min and max would be in addition to the range expression? Also, some domain expressions would be hard to express with just a numeric value -- we need comparison operators too, such as '>= 0' as well as ways of indicating, e.g., number types (e.g., "positive integers" or "all real numbers -10.0 <=x <= 10.0"). Any thoughts on the utility of a domain expression versus a range expression?
Great question. One the one hand, constraining valid values on the data type, on the other a description of the actual values in a data instance. I can imagine use cases for either kind of attribute. Needs some thought. Do we need to enable both?
@kitchenprinzessin3880 In the title of this issue it should be "variableMeasured" not "measuredVariable". If you would change that it might prevent some confusion by others starting to read the thread.
MagIC is using minValue and maxValue for the range of values in the dataset. (see example earlier in the thread). I think the scientists using our data repository would like to be able to search for datasets that have attributes that are in certain ranges. In our case allowable values are not very interesting or unconstrained. This could be different for others so I do feel being able to describe both the dataset range and the allowable range to be useful.
@smrgeoinfo thanks for updating the title;)
Note-- Ongoing discussion of this issue is now in a google doc to make editing by the workgroup easier. The current suggested recommendations are here. The discussion document includes a bunch of background material as well.
@kitchenprinzessin3880 's idea was implemented for PANGAEA, see this example: https://doi.pangaea.de/10.1594/PANGAEA.770309?format=metadata_jsonld
@uschindler sure seems that PropertyValue/about/DefinedTermSet would make more sense that subjectOf.
Not sure how this should work: "about" is not a property of PropertyValue or Thing. From the schema.org description, this looks both fine as it depends on standpoint. We can discuss about that, a change is easy. All of that is just an XSLT from PANGAEA's native metadata schema.
See https://github.com/ESIPFed/science-on-schema.org/tree/issue27-measuredVariable/examples/dataset for some example datasets and encoding approaches see https://github.com/ESIPFed/science-on-schema.org/tree/issue27-measuredVariable/guides for discussion of the issues in detail and draft recommendations see ESIP poster for overview
Spawn some new more specific issues: #141, #142, #143, #144, please continue there
The measuredVariables of a dataset specified by users/curators are unstructured and complex, e.g., DEPTH,sediment/rock In PANGAEA to enable meaningful descriptions of measured variables, we annotate them with relevant ontological terms. How can we represent ontological terms (annotations) of a measured variable through Schema.org? An example is given below, comments/suggestions are welcome.
{ "@context": "http://schema.org/", "@type": "Dataset", "variableMeasured": [ { "@type": "PropertyValue", "name": "DEPTH,sediment/rock", "subjectOf": { "@type": "DefinedTermSet", "hasDefinedTerm": [ { "@type": "DefinedTerm", "@id": "http://purl.obolibrary.org/obo/PATO_0001595", "name": "DEPTH" }, { "@type": "DefinedTerm", "@id": "http://purl.obolibrary.org/obo/ENVO_00002007", "name": "Sediment" }, { "@type": "DefinedTerm", "@id": "http://purl.obolibrary.org/obo/ENVO_00001995", "name": "Rock" } ] } } ] }