ESIPFed / stc

Repository for the Semantic Technologies Committee
http://wiki.esipfed.org/index.php/Semantic_Technologies
Apache License 2.0
4 stars 4 forks source link

use case for data workflow #40

Closed TWellman closed 4 years ago

TWellman commented 4 years ago

Suggest adding a use case for data processing workflow. We have one example to submit.

lewismc commented 4 years ago

Excellent. Don’t worry about a pull request. If you can just provide the description here then I will create the markup. That will be easier for you. Thanks Tristan. --

Lewis Dr. Lewis J. McGibbney Ph.D, B.Sc Skype: lewis.john.mcgibbney

TWellman commented 4 years ago

@lewismc

OK, no problem. I have a branch started but won't edit further or initiate a pull request. Html is Below.

Use of Ontology Information in Data Processing Workflows

Tristan Wellman, Science Analytics and Synthesis, U.S. Geological Survey

▶ Full use case description (click to expand):

An ontology is created in the Darwin Core convention to follow protocol required by OBIS-USA and NOAA NCEI. The instantiation on ESIP COR provides a stable, publically-available endpoint used in the data processing workflow. As part of the workflow, basic ontology information and external supplementary information describing each variable (term) are infused as metadata into NetCDF data files. Real-time feedback could be useful to ensure variable information and ontology information continuously align. As terms are added or modified, ontology versioning is needed to support historical data products which reference this resource.

User profile: A user or institution that expects to evolve ontology records in an automated workflow and requires reproducibility of the resulting data products that use ontology information.

Scenario: An institution in the Earth science community uses semantic vocabularies stored on public endpoints to describe scientific terms and variables in their data products. When these data products are created or revised ontologies should be updated in step. Versioning should be used to reproduce vocabulary information used in historical case studies.

Workflow:

  1. A code-driven analysis package is activated to process a collection of data files.
  2. A series of quality control and processing functions are conducted in the processing workflow.
  3. A processing function calls ESIP COR to match vocabulary terms defined within the cached ontology.
  4. Additional variable (term) information, such as variable type, units, and alias name are retrieved to enhance default information.
  5. Where vocabulary terms are new or vocabulary information has been revised or enhanced, the ESIP COR instantiation is updated to include the latest publically-available scientific information, potentially in real-time.

Requirements implied by this use case:

  1. The ontology portal has automated versioning capabilities used to preserve ontology definitions in real time. Ontologies can be retrieved by version at user request.
  2. The ontology portal allows authenticated users to update, create, or delete ontologies using a simple API, perhaps generating a modified temporary ontology while preserving the original parent ontology until a review has been completed.

,

lewismc commented 4 years ago

Excellent @TWellman I'll write this as a PR and commit to document.

lewismc commented 4 years ago

addressed via d52523c1155b771b432a16074e3143c1b47315d8 Thank you @TWellman

graybeal commented 4 years ago

Hi Tristan, this use case is very nicely written. Can you elucidate a bit the maning of "An ontology is created in the Darwin Core convention to follow protocol required by OBIS-USA and NOAA NCEI." That is, what requirements would this ontology need to satisfy to be "in the Darwin Core convention" and "to follow protocol required…"?

TWellman commented 4 years ago

Perhaps a more generic description in the first sentence would be valuable.

"A base ontology is created to describe term identifiers, labels, and definitions, which are used for processing data records through OBIS-USA and NOAA NCEI."

lewismc commented 4 years ago

Thanks @TWellman this has been accommodated in current documentation.