Open-EO / openeo-api

The openEO API specification
http://api.openeo.org
Apache License 2.0
91 stars 11 forks source link

Using STAC for data discovery? #64

Closed m-mohr closed 6 years ago

m-mohr commented 6 years ago

Simon from Google drew my attention to STAC (SpatioTemporal Asset Catalog) and STAM (SpatioTemporal Asset Metadata). Both are new evolving standards, we could take them into consideration for data discovery. At least it's a good source to check our own standard against, e.g. to add a license field to our data set description. They use JSON structures, too, which makes them fit better into our own JSON based API. I'll go through them now and see what we might want to adopt.

m-mohr commented 6 years ago

STAM seems to be mostly about individual images. That doesn't really fit our purpose. Nevertheless, there are some ideas I got from STAM:

m-mohr commented 6 years ago

STAC seems to be too much of a catalogue. For example, it requires to list asset objects that can be downloaded, which is not suitable for us. Same might apply for thumbnails.

What they are missing are band descriptions. What we are missing are properties like cloud cover, resolution, etc. (see their spec.)

Seems like adopting these two standards will be not sufficient for us...

nuest commented 6 years ago

If you want a useful license field, I can recommend the data from the Open Licenses Service, nice JSON with identifiers (SPDX) and names: http://licenses.opendefinition.org/

Alternatively: https://github.com/sindresorhus/spdx-license-list

edzer commented 6 years ago

Maybe ask @cholmes: Chris, do you know whether STAC/STAM provide data descriptions at the collection level?

m-mohr commented 6 years ago

Simon just mentioned that some eo-related changes to STAC are in the dev branch, including bands. See this example, which includes band information. Nevertheless, it's still all about files and not image collections?!

Another interesting option is the OpenSearch EO Extension with its new GeoJSON Encoding as mentioned in #68.

cholmes commented 6 years ago

The landsat example is out of date. We discussed a plan for bands, and @matthewhanson is working on the EO profile. It will have a more sensible approach for bands, and it'd be great to collaborate with you all on it. The notes on what we're planning to do are at https://github.com/radiantearth/community-sprints/blob/master/03072018-ft-collins-co/notes/stac-eo.md#asset-definition

Though it's just a rough sketch, I'd wait till Matt is able to get up the EO profile, which really necessitates the band descriptions and other collection level stuff. Our goal is really search of actual data, but we want to not repeat all the collection data at an individual record level, so it will have a place where that common stuff can be defined.

Not sure what you mean by 'it's still all about files and not image collections' - it's links to assets, those don't have to be actual files online, but it is a reference to something that you can download. Or does Open-eo just provide search at the collection level, not individual collects? How do you actually acquire an image?

I saw open-eo presented last week at the OGC meeting, looks like a great effort. And would be great to align, though may be better to talk than try to figure it out in tickets.

I've looked at the OpenSearch EO Extension GeoJSON encoding, there's decent documentation of it. My hope is that we can use JSON-LD type structures to share some of the naming, though we'll likely do a small subset of all that they have.

cholmes commented 6 years ago

And +1 to SPDX license list, that's what we specified. Though the one downside is that it doesn't give much guidance for non open licenses, and we do want to expose that data for search as well.

cholmes commented 6 years ago

Do you all have band descriptions? We'd be happy to share definitions on those, are just going to put the first version of that out soon. We definitely need them for the EO profile.

edzer commented 6 years ago

Thanks! +1 on discussing in person first; will send you an email.

m-mohr commented 6 years ago

Good mordning @cholmes , thanks for all the information, highly appreciated. I looked through the examples and meeting notes and it is now much clearer where you are heading to and it seems to be the right direction, also for our use case. The old examples gave me a wrong impression, I think. Looking forward to discuss things in person.

m-mohr commented 6 years ago

STAC made good progress and released version 0.4. At the moment it is not yet at a stage which fulfills our requirements, but there are ideas and plans that would allow us to use STAC. See https://github.com/radiantearth/stac-spec/issues/81 for an important discussion.

Unfortunately, there is this part of the spec: "All static catalogs must contain at least 1 Asset, as the point of the SpatioTemporal Asset Catalog is to be link to actual actual data, not to just reference metadata (though it is not required that all users have permissions to access the asset).". We are currently just referencing metadata. Providers might want to link to their assets, but some might not want to do that.

m-mohr commented 6 years ago

Another idea is to be compatible with WFS3.0, which is also adapted by STAC. This would mean to rename /data to /collections, but that shouldn't be a problem. Example: https://cmr-stac-api.dev.element84.com/docs/index.html That document also has some other nice ideas, e.g. how to define the temporal reference system etc.

m-mohr commented 6 years ago

Will be handled with #114.