Ouranosinc / pavics-sdi

Power Analytics and Visualization for Climate Science - Spatial Data Infrastructure
https://pavics-sdi.readthedocs.io
6 stars 2 forks source link

Investigate STAC catalog option #187

Open huard opened 3 years ago

huard commented 3 years ago

STAC could be used server side as a catalog server. If I understand correctly, there is an ESM extension in development by the Pangeo community.

Ideally, THREDDS would have a STAC endpoint serving STAC + ESM catalogs. This is however unlikely to occur in the near to mid-future. I suspect we'll have to deploy software to convert the THREDDS catalog to a STAC + ESM catalog, then serve this using a dedicated STAC server. The client could be intake-stac with intake-xarray.

My suggestion for short term progress would be to

While

Tools

References

philipkershaw commented 3 years ago

Hi David @huard - found this issue from https://github.com/pangeo-forge/cmip6-pipeline/issues/7 Just wanted to highlight that for the ESGF future architecture work we are planning to ditch using THREDDS catalogues for datasets. The new container-based release of ESGF that we are trialling does just this. All dataset information would need to be referenced direct from the ESGF Search API. In the longer run we are looking at community standards for the search API for ESGF. An ESM profile of STAC could be a good candidate.

huard commented 3 years ago

Hi Phil @philipkershaw !

We are relying on the NcML API to provide aggregated views of multiple files composing the same dataset (periods, members, variables), as well as other OGC APIs (WMS, WCS). Do you know if support for these APIs is in the cards within the new ESGF stack ?

I think it would be worthwhile for us to get better acquainted with the ESGF new architecture roadmap to schedule our own efforts and collaborate. Are there documents we can look into ?

philipkershaw commented 3 years ago

There would be nothing to stop a data provider in ESGF generating specific THREDDS catalogues and NcML for the purposes of aggregations or whatever else. We have had some experience of this at CEDA with the ESA CCI data we host.

However, the change for the future architecture is that the new publishing system would not generate THREDDS catalogues per dataset as part of the publishing process. We are still some time away from having deployment of the new system across the federation. Planning is still underway for the roadmap to get to this point. The next step is to deploy the new system as a pilot at sites that would like to participate. There is a project board for the new work:

https://github.com/orgs/ESGF/projects/1

...but this is more of detailed development view than a broader roadmap. There is also the ESGF Future Architecture report which has an analysis of the existing system and proposals for changes and a roadmap for implementing these:

https://doi.org/10.5281/zenodo.3928222

I will try and keep you up to date with our plans and would be happy to discuss further if you have questions.