ESIPFed / science-on-schema.org

science-on-schema.org - providing guidance for publishing schema.org as JSON-LD for the sciences
Apache License 2.0
109 stars 31 forks source link

Issue27 update variable measured #168

Closed smrgeoinfo closed 2 years ago

smrgeoinfo commented 3 years ago

Trying to get the ball rolling on this. The ADR references the new text to place in Dataset.md, which is ready for review, based on the issue 27 discussion. Next step is review, Adam can you add other reviews as you see fit.

mbjones commented 3 years ago

@smrgeoinfo @ashepherd I have been looking at this and plan to submit a review, but I am out tomorrow and so I likely won't make the call. @amoeba, @mpsaloha, and I talked about it and so hopefully they will be able to make the discussion.

rduerr commented 3 years ago

I've been reading this, fixing a few typos and have a few questions:

  1. In the section Value Range is Controlled Vocabulary the issue of how one handles a long list of TermCodes probably needs discussion? (Sorry wasn't around for the last meeting or so - so this might have been discussed).
  2. In that same section, what would the file in this line "rangeincludes": [ https://www.astromat.org/vocab/calcavg] have to look like?
  3. Can we rename that section to be something like "Variable uses a Controlled Vocabulary"?
  4. Why can't we replace the section header "Variable is represented by a dimensioned set of values (grid, coverage, time series, data cube)" by "Variable is an array"?
  5. So actually the example of metadata about a variable is given in the "Variable is represented by a dimensioned set of values (grid, coverage, time series, data cube)" section so should it really be discussed in the Structured Value section? It would simplify that section to eliminate those bits (or at least reference the example above that section)...
  6. The sentence "Recommended data types for container PropertyValue with valueReference child elements documenting the attributes or component and measured values:" is way too complex - can it be turned into English?
smrgeoinfo commented 3 years ago

@rduerr

In the section Value Range is Controlled Vocabulary the issue of how one handles a long list of TermCodes probably needs discussion? (Sorry wasn't around for the last meeting or so - so this might have been discussed).

This is related to the next question. What I am thinking is that in this case the 'rangeIncludes' could be the URI for a vocabulary of any length. In the vision of a linked data universe, this URI could be dereferenced (with content negotiation) for various representations.

In that same section, what would the file in this line "rangeincludes": [ https://www.astromat.org/vocab/calcavg] have to look like?

For schema.org purposes, the intention is that the URI could be dereferenced to obtain a schema:DefinedTermSet representation of the vocabulary.

smrgeoinfo commented 3 years ago

Can we rename that section to be something like "Variable uses a Controlled Vocabulary"?

I was trying to be precise in use of language; can we get more input on what will be most intelligible for our users

smrgeoinfo commented 3 years ago

Why can't we replace the section header "Variable is represented by a dimensioned set of values (grid, coverage, time series, data cube)" by "Variable is an array"?

If the dimensions of arrays always have some physical interpretation, this would work. I was trying to align language with usage in DataCube and NetCDF. need more input on what will be most intelligible to users...

smrgeoinfo commented 3 years ago

So actually the example of metadata about a variable is given in the "Variable is represented by a dimensioned set of values (grid, coverage, time series, data cube)" section so should it really be discussed in the Structured Value section? It would simplify that section to eliminate those bits (or at least reference the example above that section)...

I think the problem here is that the example provided is really a dataset-level description, not a variableMeasured description. Muddled thinking on my part. I have edited text to move this to a section about array, grid or coverage datasets.

smrgeoinfo commented 3 years ago

The sentence "Recommended data types for container PropertyValue with valueReference child elements documenting the attributes or component and measured values:" is way too complex - can it be turned into English?

edited to (try to) make clearer.

ashepherd commented 3 years ago

update the qudt:dataType to use the Schema.org DataType class instead of the literal (see JSON-LD Playground) string by modifying the example to:

...
"qudt:dataType": { "@id": "Text" }
...

See: https://json-ld.org/playground/#startTab=tab-expanded&json-ld=%7B%22%40context%22%3A%7B%22%40base%22%3A%22http%3A%2F%2Fschema.org%2F%22%7D%2C%22qudt%3AdataType%22%3A%7B%22%40id%22%3A%22Text%22%7D%7D&context=%7B%7D

I had to specify the "@base" to make it work in this way. the other way would be to use the full URI

https://json-ld.org/playground/#startTab=tab-table&json-ld=%7B%22%40context%22%3A%22http%3A%2F%2Fschema.org%2F%22%2C%22qudt%3AdataType%22%3A%7B%22%40id%22%3A%22http%3A%2F%2Fschema.org%2FText%22%7D%7D&context=%7B%7D

smrgeoinfo commented 3 years ago

Responding to @ashepherd comment

qudt:dataType to use the Schema.org DataType class

This is a change that would need to be propagated through the entire text and all the examples. See #170. For now I'm going to change to full URI strings like this:

"qudt:dataType": "http://schema.org/Text" for all the dataType identifiers. Hopefully we'll get some feedback on #170.

smrgeoinfo commented 3 years ago

The Dataset.md with revised variableMeasured section has been added to the PR, along with several examples for representing different kinds of variables.

amoeba commented 3 years ago

Hi @smrgeoinfo, this is looking good. I know the group is getting close to wrapping up but I wanted to run two thoughts by you:

  1. I really liked the previous version that had the tiers (starter, etc). It seems to me the vast majority of dataset metadata records that might be serialized this way won't have an IRI to pair with every variable so I think starting with a lower tier at the top of the document would make the guidance easier to digest and adopt
  2. Something feels a tad off about bringing schema:Observation into the picture that might make this guidance tricky to adopt and might make for some odd entailments. I think it's because the root property here is variableMeasured and variables (attributes, features) are more hops across the graph than we're showing here and have a different shape. I'd love to be able to express observation-level semantics like you show but I wonder if you've thought about whether there'd be an inferencing problems related doing it this way?
smrgeoinfo commented 3 years ago

@mpsaloha what do you think about including the mention of schema:Observation?

smrgeoinfo commented 3 years ago

@amoeba I added Tier1 (name, description), Tier2 (Use propertyID http URI), Tier3 (other schema.org properties for numeric values), and Tier 4 (data type, enumerated values), and and 'Advanced' section for structured values and reference values.

Waiting to here what @mpsaloha thinks about the observation section.

amoeba commented 3 years ago

Looks great, thanks @smrgeoinfo.

smrgeoinfo commented 2 years ago

Discussion at online meeting 2021-12-06, move Tier 4 to new experimental document. Tier1 -3 look OK, Matt J will review text still.

nein09 commented 2 years ago

@mbjones is it worth including some of that stuff in a gitignore, in a follow-on PR? Are those files likely to creep back in?

smrgeoinfo commented 2 years ago

What about moving all that stuff to an Issue27archive branch and deleting from the issue27-forPullRequest branch? I know if we delete them they're still in the repo history, but would be harder to find. Alternatively I could just copy them to my local machine here in case they're needed again, I think i generated most of that cruft... :)

mbjones commented 2 years ago

Yeah, I think moving all of that stuff to another branch sounds great. They are useful, but because they go beyond the current guidelines and in many ways don't comply with the guidelines, I think they are confusing in their current location. So, let's tuck them away in another branch so we can revisit them in the future ;-)

mbjones commented 2 years ago

OK, I removed the extraneous files from this PR branch, but saved them in https://github.com/ESIPFed/science-on-schema.org/tree/history-save-27-variable-measured for future reference. At this point, I think this PR #168 is ready to merge to develop.

mbjones commented 2 years ago

Merged variableMeasured to develop. Any further changes should be proposed in a new issue.