ESM-VFC / esm-vfc-api-demo

MIT License
3 stars 2 forks source link

Use CoverageJSON for serializing model (and observational) data? #8

Closed benbovy closed 3 years ago

benbovy commented 3 years ago

CoverageJSON is "a format for publishing geotemporal data to the web". I think it is very relevant for our case. We should probably just use it for much easier integration between the backend and the frontend, rather than using "plain" GeoJSON and define our own specifications on top of it (which would require many iterations).

Advantages of CoverageJSON:

GeoJSON might still be useful at some places, though, especially for API query/post parameters.

Any thoughts @alirezamdv @willirath @koldunovn @suvarchal ?

willirath commented 3 years ago

👍

koldunovn commented 3 years ago

Looks good! The only thing that worries me a bit is that it's still in a draft stage and is not updated since 2015 :)

benbovy commented 3 years ago

The only thing that worries me a bit is that it's still in a draft stage and is not updated since 2015 :)

Yes that's a good point. Not a big issue if we just use the specs without relying too much on 3rd-party tools implementing it, though. The fact that an active project like pygeoapi uses it by default (for its xarray and rasterio providers) makes me a bit less worried too.

alirezamdv commented 3 years ago

@benbovy I have no experience with it, but looks easy to use it with leaflet.

alirezamdv commented 3 years ago

@benbovy this Format is just accessible with A third party library and this is just available for leaflet, i couldn't find one for Open Layers, and not easy to Style... In my opinion Geojson was better and for large data maybe TopoJson: https://github.com/topojson/topojson

benbovy commented 3 years ago

Yes I agree there is not much tools currently available (and/or well maintained) to handle the coverageJSON format in a convenient way, and that's a bit unfortunate. That said, I'm still (quite strongly) in favor of using that format.

As far as I'm familiar with the GeoJSON format, it is too generic for our use cases IMO. How should we deal with the temporal dimension in model outputs? How should we encode model fields metadata like units, variable names, etc.? How to encode tiled data? Those are issues that have been already solved by CoverageJSON. Ocean models like NEMO or FESOM2 usually store their outputs using the NetCDF data model. As I understand it, CoverageJSON has been designed specifically as a bridge between the netCDF data model and the "OGC-like" formats commonly used to handle geospatial data in frontend applications, which seems very much what we actually need.

I'm not sure how TopoJSON would better work. Like for GeoJSON, we would still need some good amount work on adapting the NetCDF model to it.

In #11 I don't rely on any third-party libraries, I've rather implemented the CoverageJSON specs (the part that we need). It didn't take me much effort. Maybe on the frontend side you could implement some helper functions to deal with the format (e.g., CoverageJSON / GeoJSON converters)?

As there's not much tools for CoverageJSON, there's some implementation efforts to do on both the backend and frontend sides, but at least we don't have to settle on a data model design, which would anyway require a lot of work and discussion IMO.

benbovy commented 3 years ago

A few more thoughts after some more research and self-education (to be honest, my experience with handling geospatial formats in web applications is pretty limited).

My understanding is that the GeoJSON specs allow custom properties be defined at the feature level but not at the geometry level. This means that in theory we cannot use "multi" geometries like LineString, MultiLineString or MultiPoint to store information like time or model data (one or more fields) at the vertices of those complex geometries (e.g., ship track line vertices, set of station points). So we would have to create a Point feature for every vertex instead, wrapped in a FeatureCollection. While this is probably fine with just a few dozens of vertices, for bigger queries we might quickly hit some serious performance issues.

There has been some proposals to extend GeoJSON for dealing with those specific issues, like here and here, but they seem stale.

TopoJSON is more compact than GeoJSON for some cases (e.g., many features with complex geometries that share the same arcs), but I'm not sure if that applies to our case since we would have to deal with many point features anyway.

I guess most people currently use ad-hoc solutions to deal with those issues? That's probably why some folks tried to come up with new, more adapted formats like coverageJSON.

Alternative candidates that I've found so far:

  1. cf-json
  2. netcdf-ld
  3. coverage-json

1 and 2 are still drafts and are too close to the (cf)-netcdf data model IMO, it won't play nicely with OGC-friendly front-end tools. 3 is approved by OGC, but to my understanding it is limited to grid/mesh domains only (i.e., it doesn't provide solutions for extracted data along sections, trajectories, profiles, etc.). This lefts us with coverageJSON, which seems the most adapted to our problem among those (tentative) standards. According to the activity in https://github.com/opengeospatial/ogc_api_coverages, its integration as a OGC standard is still actively discussed.

To summarize, sadly there's still no easy way (widely adopted standard and/or mature tools) to handle geotemporal model outputs in web applications. While I'm advocating here for coverageJSON, I'm really open to discussion and to any alternative - "standard" or ad-hoc - solution that would better suits our needs and constraints from a full-stack perspective!

Some useful links can be found in the discussion here: https://github.com/pangeo-data/pangeo-datastore/issues/3

alirezamdv commented 3 years ago

thank you for your detailed description, let's stay with CoverageJson... maybe we should, as you said, fit our individual solution to it and turn it into our adaptable data structure... I'm going to try to modify and extract some piece of code from this extended leaflet library to create a module to work with in frontEnd, I will try to find out how to style it properly.

benbovy commented 3 years ago

I've just had a deeper look at the covjson-reader and leaflet-coverage libraries, and although I'm not an expert in Javascript, I now realize that it represents more work than what I've done in #11.

Depending on how goes the experiments on your side, we could still get back to some GeoJSON + custom JSON home-made solution and see how it goes. At this stage, I think we could manage maintaining the two approaches on the backend side.