ESIPFed / science-on-schema.org

science-on-schema.org - providing guidance for publishing schema.org as JSON-LD for the sciences
Apache License 2.0
109 stars 31 forks source link

Referring to Controlled vocabulary terms consistently #248

Open alko-k opened 5 months ago

alko-k commented 5 months ago

Hello guys, I felt there is some inconsistency related to terms that come from controlled vocabularies: Why keywords that come from a vocabulary are "DefinedTerm" while variableMeasured values are "PropertyValue"? Same for the unit shown below: "unitText": "decimal degrees", "unitCode":"http://qudt.org/vocab/unit/DEG",

I suggest to consistently mark up controlled vocabulary/ontology terms in a dataset e.g. It would help to consistently refer to controlled vocabulary terms either as DefinedTerm or PropertyValue so that software that is built to consume them, will not have to change depending on the case.

Thank you for your great effort to describe datasets meaningfully

Alexandra

mbjones commented 5 months ago

Hi Alexandra --

Thanks for the pointer. This is a great discussion to continue to have, and I agree consistency is always good. In this case, I think our main reason was to stay consistent with the underlying schema.org recommendations, which say that http://schema.org/keywords has an expected range of ' http://schema.org/keywords' (and a couple of other types) and that ' http://schema.org/variableMeasured' has an expected range of ' http://schema.org/PropertyValue'. So, our guidance follows the main standard's lead, despite a bit of variability there in what they recommend. I think the discussion of just what the difference is between DefinedTerm and PropertyValue would be good to understand further.

Matt

dr-shorthair commented 5 months ago

SDO is a bit confusing in this space. I would expect PropertyValue to be the result of an observation or measurement. While variableMeasured is about the semantics of the observation or measurement - what is being measured, rather than the magnitude of the measurement.

Then there is also https://schema.org/measuredProperty in the mix. Unfortunately SDO did not explicitly adopt one of the pre-existing observation models, such as OBOE, O&M (later implemented in SSN), etc.

ashepherd commented 5 months ago

@alko-k, thanks for the suggestion! I'm curious if the responses above mean you are OK with the way the guidance is now (until schema.org makes improvements) or if you have a proposal of what the guidance should be changed to? thanks again!

alko-k commented 5 months ago

Hello @ashepherd thank you for your question and yes I do have a suggestion that is mostly related with the consistency of referring to vocabulary terms properly. thanks @mbjones :-) A discussion would be very helpful to maybe convey my thoughts on that. I cannot agree more with @dr-shorthair on the PropertyValue type :-)

Here is a snippet of code of how I would envision the controlled terms to be referenced. I also used uom instead of unitText and unitCode that actually reference a term from a vocabulary.

{ "@context": "https://schema.org/", "@type": "Dataset", "name": "A dataset", "keywords": [ { "@type": "DefinedTerm", "name": "OCEANS", "inDefinedTermSet": "https://gcmd.earthdata.nasa.gov/kms/concepts/concept_scheme/sciencekeywords", "url": "https://gcmd.earthdata.nasa.gov/kms/concept/91697b7d-8f2b-4954-850e-61d5f61c867d", "termCode": "91697b7d-8f2b-4954-850e-61d5f61c867d" }], "variableMeasured": { "@type": "DefinedTerm", "name": "pressure", "inDefinedTermSet": "http://vocab.nerc.ac.uk/collection/P02/current/", "url": "http://vocab.nerc.ac.uk/collection/P02/current/CDTA/", "termCode": "CDTA", "uom": { "@type": "DefinedTerm", "name": "Kelvin", "inDefinedTermSet": "https://vocab.nerc.ac.uk/collection/P06/current/", "url": "http://vocab.nerc.ac.uk/collection/P06/current/UPKA/", "termCode": "UPKA" } }

}

Let me know what you think

Best Alexandra

mbjones commented 5 months ago

While I also agree with the inconsistency issues raised by @alko-k and @dr-shorthair -- I do think there is value in sticking to the SDO recommended ranges, rather than deviating to use DefinedTerm when SDO recommends PropertyValue. The whole Internet follows SDO recommendations. If we were to make a change, then there are other places in the SOSO guidelines where this pattern is used and should be changed too, such as in controlled vocabularies for temporal coverage re: deep time. So it would have fairly wide consequences for the guidelines, and deviate from SDO.

I think in general the major harvesters understand that PropertyValue is being used to reference linked vocabularies, where schema:propertyId has the URI of the controlled term from a vocabulary, and unitCode has the URI of the unit definition. I think the example from our guidelines expresses essentially the same information shown above with DefinedTerm in the previous comment, and google properly interprets it as a controlled vocabulary term URI:

  "variableMeasured": [
    {
      "@type": "PropertyValue",
      "name": "latitude",
      "propertyID":"http://purl.obolibrary.org/obo/NCIT_C68642",
      "url": "https://www.sample-data-repository.org/dataset-parameter/665787",
      "description": "Latitude where water samples were collected; north is positive. Latitude is a geographic coordinate which refers to the angle from a point on the Earth's surface to the equatorial plane",
      "unitText": "decimal degrees",
      "unitCode":"http://qudt.org/vocab/unit/DEG",
      "minValue": "45.0",
      "maxValue": "15.0"
    },

The only thing DefinedTerm provides in addition is inDefinedTermSet, which would certainly be nice to have in there, but hopefully is discoverable from the term URI itself.

ashepherd commented 5 months ago

I agree with @mbjones on following the schema.org recommended ranges. For those Dataset publishers (outside of our discipline(s), possibly) that want to express the actual observed/monitored value of the variable I think, PropertyValue allows them to do this with the value property. And if needed, those publishers, can also specify the reference frame for interpreting that value by using valueReference.