Term for data "level" - Githubissues

ferag commented 1 year ago

Hi all,

I couldn't find a way to include information about the "data level" of a dataset. I mean, distinguish if the data has been created directly (e.g. from a sensor) of it is a result of a model (like derived data). I guess otherEntity is not the way. Is thare anyone who has dealt with something like that?

Thanks in advance,

Fernando

twhiteaker commented 1 year ago

The CUAHSI Observations Data Model v1.1 has the notion of a ValueType, with possible values “Field Observation”, “Laboratory Observation”, and "Model Simulation Results”. I don't know of an equivalent in EML.

For BLE LTER, we just indicate the data level in various places like the data package title, abstract, methods, and descriptions of tables.

Are you wanting to indicate data level for the sake of the human, or a machine that would recognize something as a model based dataset and process it differently?

If a best practice is developed for indicating data level, it should be added to the existing Best Practices for Model-Based Datasets.

ferag commented 1 year ago

Hi! Thanks for the feedback, very useful. The aim is to help machines to recognize the type to process properly. Maybe we could use a controlled vocabulary for that, because our intention is to use one single term.

Cheers,

Fernando

mbjones commented 1 year ago

Yes, I think that a controlled vocabulary with an appropriate EML semantic annotation would be the most effective. For example, given the existence of a vocabulary that articulates your desired processing level classification, you could add an annotation like so:

<eml>
...
<dataset id="dataset-12345">
    ...
    <annotation>
        <propertyURI label="has processing level">http://sweetontology.net/relaProvenance/hasProcessingLevel</propertyURI>
        <valueURI label="Level 1">http://sweetontology.net/stateDataProcessing/Level1</valueURI>
    </annotation>
...

I'm not sure if that SWEET ontology with its Processing level terminology is what you need, but I think it is related to the NASA data processing levels vocabulary. Do you have a particular vocabulary in mind?

mbjones commented 1 year ago

I also asked for input from the ESIP semtech cluster, we'll see if anyone has input... https://esip-all.slack.com/archives/CNV0W28H4/p1671592274000599

ferag commented 1 year ago

Hi Matthew,

Thanks for your support!

I don't have any vocabulary in mind, but we want to tag different data like:

Data from sensors
Biodiversity observations
Remote-sensing indicators
Species distribution model outputs
etc.

So we need to identify those types somehow. BTW: Nice profile. I think we are doing similar things ;)

Fernando

mbjones commented 1 year ago

Thanks, @ferag Let us know how it works out, and what you decide to use for a vocabulary if you go that route.

As there don't seem to be any bugs or features in EML to resolve with this ticket, I am going to close it for housekeeping. Feel free to reopen it if something needs to be resolved.

rrovetto commented 1 year ago

@ferag You can create your own categories, metadata elements, or a vocabulary for what you need. I'd recommend that so that way you ensure your have the content you intend and desire. But in any case, consider what your intended meaning of 'data level' (and other phrases or concepts), and whether that meaning matches whatever or whichever category(s) you may create or use. E.g., from what you wrote, its sounds like you mean the source of the data or the way the data was generated. Is that what you mean by 'level'? Would 'source' be more accurate for your intended meaning? and so on.

Based on what you wrote, I can create a conceptual model for what you may need. Contact me at links below.

@mbjones saw your post about in cluster. I gave presentation on sweet in a past conference. I also just spent some personal time looking at the ontology modules. Based on the structure and formal semantics as expressed by the axioms, there is some correspondence with NASA data levels, but it's not explicitly stated as such. Re:definitions, there does not appear to be specific textual definitions for those. However, again, an examination of the structure can yield insight, if indirect, into the intended definition by the creators of that given structure portion of the ontology(s). in general, one can create and propose their created definition for sweet. In my presentation, I suggested some paths toward that, as generic and high-level ontology development aspects are a research pursuit.

Actively needing employment, phd study opportunities.
Contact form
Direct hire for selected topics.

ferag commented 1 year ago

Dear Robert,

Thanks for your feedback. I'm not sure if source is the proper concept, since what we want to categorize if the data have been collected from a "real" environment (a sensor, an observation) or if it is "artificially" created (a simulation result). I will contact you for more details. Thank you very much.

Fernando

NCEAS / eml

Term for data "level" #391