ILIAD-ocean-twin / data_access_api

Apache License 2.0
1 stars 0 forks source link

Expose Oilspills data pilot #4

Open pzaborowski opened 1 year ago

pzaborowski commented 1 year ago

The pilot execution implemented in the Application Package is producing STAC and NetCDF files. example file is meant to be used as the tested and eventually used on the examplars

miguelcorreia19 commented 1 year ago

Here are the variables and dimensions from the NetCDF file, from OpenOil model:

miguelcorreia19 commented 1 year ago

Here is a netCDF example result from openoil model:

https://pipe-drive.inesctec.pt/output_examples/openoil/simulation.nc

pzaborowski commented 1 year ago

Thanks! The file declares CF1.6 convention, while it is missing some highly recommended elements. Can we do something about it? There are a few tools to check, like this easy-to-use online on https://compliance.ioos.us/index.html

Most important for the moment: lat, lon, z are not dimensions nor coordinates like in these examples: https://www.ncei.noaa.gov/netcdf-templates Which makes the interpretation quite custom. not only we are guessing coordinate system, but also They shall map to the data domain, which on default is done with coords.

Second, variables description are missing units and URI of the measurable. First is the CF recommendation, second is ours. alternatively, they may be supplemented in the configuration mapping.

pzaborowski commented 1 year ago

We're considering two cases: visualisation and data access for further reuse. Example visualisation is provided by source model repository: https://github.com/OpenDrift/opendrift Where data is organised per snapshots (singular trajectory does not matter much probably in this scenario).

We're considering now 2 options for this type of data:

  1. coverageJSON/NetCDF trajectories - it can quite well represent trajectories, while default understanding is following CF structure with lat-lon in domain: https://covjson.org/playground/
  2. Feature collection extended with domain set defined according to https://docs.ogc.org/is/19-045r3/19-045r3.html In particular to have datatime with points cloud in addition to flat list of observations per point and time.
pzaborowski commented 1 year ago

Considering potential scenario of the visualisation of such data via API, first step is usually reading the domain set and rangeset, second to query data according to visualisation state.

First step metadata of the collection can implement features like http://defs-dev.opengis.net/iliad-pygeo/collections/records?f=json

with the timestamps and area:

"extent":` {
        "spatial": {
            "bbox": [
                [
                    -180,//x
                    -90,//y
                    -40,//z
                    180,
                    90,
                    0
                ]
            ],
            "crs": "http://www.opengis.net/def/crs/OGC/1.3/CRS84"
        },
        "temporal": {
            "values": [
                    "2011-11-11T11:11:11+00:00",
                    "2011-11-11T12:11:11+00:00"...
            ]
        }
    },

extended by the properties definition of the rangeset:

"parameter-names": {
        "sea_surface_temperature": {
            "id": "sea_surface_temperature",//here shall be absolute URI or id resolvable by the given context
            "type": "Quantity",
            "name": "sea surface temperature",
            "encodingInfo": {
                "dataType": "http://www.opengis.net/def/dataType/OGC/0/float64"
            },
            "nodata": "null",
            "uom": {
                "id": "http://www.opengis.net/def/uom/UCUM/Deg C",
                "type": "UnitReference",
                "code": "Deg C"
            },
            "_meta": {
                "tags": {
                    "long_name": "SEA SURFACE TEMPERATURE",
                    "history": "From coads_climatology",
                    "units": "Deg C"
                }
            }
        },...

Given we have whole description of the values in the high level, we can expose the values without it given explicitly

Then feature could be collection of points per timestamp (or wrapping polygon if can generate such):

{
   "type": "FeatureCollection",
   "features": [
       {
           "type": "Feature",
           "id": "t1tj1",
           "geometry": {
               "type": "PointCloud",
               "coordinates": [[11.0,2.0,2], [12.0,3.0,3], [10.0,3.0,1]]
           },
           "properties": {
               "datetime": "2012-01-17T12:33:51Z",
               "sea_surface_temperature": [1,2,3],
...
           }
       }

Collection is limited to query parameters:

Transfer efficiency is assuming here no resolution reduction known from tiling mechanisms. it can be potentially done adding new domain property, but would also require additional resolution reduction mechanism implemented.

The other thread is variables collection, we're putting them in one spreadsheet to identify potential common ones: https://docs.google.com/spreadsheets/d/12MadX52_fYrDkMfSZWSlQcJ6LxSjRANCR7YsGP6smBc/edit#gid=251363516

pzaborowski commented 1 year ago

Apparently, variables wihout defined unit in the sample file are defined in fact in https://cfconventions.org/Data/cf-standard-names/29/build/cf-standard-name-table.html and have URIs at least here: http://vocab.nerc.ac.uk/collection/P07/current/ and here: http://vocab.nerc.ac.uk/standard_name/

  1. It may be better and would be compatible with CF1.6 to include units in the variable attributes, @miguelcorreia19 what do you think?

  2. Which ones shall use for reference? I'd choose the http://vocab.nerc.ac.uk/standard_name as they can be mapped by URIs (not preflabel). Shall we include URIs in the attributes of the variables? What do you think?

  3. variables with units in the sample file are not defined within standard names - we'd need repository of these

  4. @rob-metalinkage @rapw3k @avillar Would it make sense to define context file with the cf standard names or simply add in the context http://vocab.nerc.ac.uk/standard_name . Having profile with all the properties referred from multiple dictionaries would allow us maybe to use directly instead of http://example.com/iliad-data-access/meta-model/observable-properties/ in the example generated for coverages

    "parameters": {
      "@id": "iliad-props:hasParameter",
      "@container": "@index",
      "@context": {
        "@base": "http://example.com/iliad-data-access/meta-model/classes/",
        "description": "rdfs:label",
        "unit": "om2:hasUnit",
        "symbol": "om2:symbol",
        "observedProperty": {
          "@id": "sosa:observedProperty",
          "@context": {
            "@base": "http://example.com/iliad-data-access/meta-model/observable-properties/",
            "id": "@id",
            "label": {
              "@id": "rdfs:label",
              "@container": "@language"
            }
          }
        },

    The whole file is here: https://github.com/ILIAD-ocean-twin/data_access_api/blob/main/examples/OGC_EDR/edr_coverage.jsonld

4a. the other thing the whole standard_names dictionary is quite big and the service does not resolve very efficiently IMO but can we assume we can import the whole dictionary in the 'profile' without listing particular IDs names explicitly?.