Open j08lue opened 3 weeks ago
The CDSE Sentinel-2 L2A data are in JPEG2000. Performance / efficiency of reading overviews from those is not great, as @vincentsarago documented here.
What alternatives do we have? Looking at the CloudFerro STAC https://radiantearth.github.io/stac-browser/#/external/https://pgstac.demo.cloudferro.com - the only collection with COGs seems to be Sentinel-1 Ground Range Detected (GRD).
It uses the alternate assets
extension for S3 links, too.
We may need to add alternate assets support to eoapi-k8s:
As discussed here:
Ok, the CDSE Sentinel-2 L2A data are in JPEG2000. Performance / efficiency of reading overviews from those is not great, as @vincentsarago documented here.
This blog post is quite old. Are the results still valid? The visualization services from Sinergise and Copernicus Dataspace Ecosystem use the JPEG2000 format.
What alternatives do we have? Looking at the CloudFerro STAC https://radiantearth.github.io/stac-browser/#/external/https://pgstac.demo.cloudferro.com - the only collection with COGs seems to be Sentinel-1 Ground Range Detected (GRD).
Sentinel-1 is not an option as you need to conduct some pre-processing steps.
This blog post is quite old. Are the results still valid? The visualization services from Sinergise and Copernicus Dataspace Ecosystem use the JPEG2000 format.
If I remember well Sinergise use a proprietary driver to read the JPEG2000. GDAL maintainers did some improvement in GDAL and OpenJPEG drivers but this is still not as efficient as COG.
Sentinel-1 is not an option as you need to conduct some pre-processing steps.
What kind of pre-processing? you can visualize Sentinel-1 GRD which are stored as COGs
The GRD data CloudFerro hosts is already terrain-corrected, at least. The thumbnails they reference look ok?
But perhaps we could still use the JPEG2000 Sentinel-2 L2A for demo purposes and see how it goes in terms of speed and GET requests to CloudFerro S3.
Note on CreoDIAS access from TiTiler deployed in EOEPCA k8s: fundamentally, we should have access. Might need to generate some kind of credentials (@rconway knows).
Note on CreoDIAS access from TiTiler deployed in EOEPCA k8s: fundamentally, we should have access. Might need to generate some kind of credentials (@rconway knows).
s3 key gen on CDSE (free with predefined monthly quota limits) https://documentation.dataspace.copernicus.eu/APIs/S3.html s3 key extraction on Creodias (data transfer is not limited even on the smallest machines) https://creodias.docs.cloudferro.com/en/latest/eodata/How-to-get-credentials-used-for-accessing-EODATA-on-a-cloud-VM-on-Creodias.html
hope this will help
Status - we decided to
s3
links to main href and (cleanup) remove alternate assets references/cog
endpoint on eoAPI? https://eoapi.develop.eoepca.org/raster/api.htmlRegarding which subset, anything easy enough to handle is fine.
Btw, the Planetary Computer Explorer can generate nice code snippets for querying.
Btw, the collection metadata also needs a bit of cleanup after removing the alternate assets: item_assets
and auth:schemes
and perhaps stac_extensions
.
So once a subset is decided:
href
with the alternate:s3
href, swap its auth
for the alternates"description": "S3 storage provided by CloudFerro Cloud and OpenTelekom Cloud (OTC). Use endpoint URL
https://eodata.dataspace.copernicus.eu.",
would live on the item? It appears to be a property on the alternate entry but I couldn't see where it would live on the asset after?Regarding which subset, anything easy enough to handle is fine.
* 2024 to date, all of Europe? * past 3 years, whatever country/region - Iceland? 🤷 🇮🇸 🌋 🧊
Since pgstac.demo.cloudferro.com is still in development and the S2L2A collection consists of only a few items, here is a JSON file containing 9,352 (one missing product: /eodata/Sentinel-2/MSI/L2A/2023/03/29/S2B_MSIL2A_20230329T100629_N0509_R022_T33UWV_20230329T130657.SAFE) products intersecting with Poland's geometry, ranging from content_start_date >= '2023-01-01 00:00:00' to content_start_date < '2024-01-01 00:00:00'.
file size -> 656.5Â MB Hope it will help: https://s3.fra1-2.cloudferro.com/swift/v1/poland-stac/poland-data.json
Ah, was not aware, thank you!
Poland 2023 works, let's see whether we need to subset further if 9k is too much for a quick fix. 🇵🇱
@MathewNWSH are you able to export this as a list of items in JSON?
I can't load it with json.load()
right now..
@ciaransweet
Each line consists of a STAC JSON item. The following works for me to print the first STAC item:
head -n1 poland-data.json | jq . | less
@ciaransweet
Each line consists of a STAC JSON item. The following works for me to print the first STAC item:
head -n1 poland-data.json | jq . | less
Sure, it would be nice if it was wrapped into a json array to be a bit more 'valid' to read in :D
I'll process line by line for now.
Understood. The format (ndjson) is used by pypgstac as well. I guess this is why @MathewNWSH has made this format available.
Understood. The format (ndjson) is used by pypgstac as well. I guess this is why @MathewNWSH has made this format available.
Cool thanks, good to know!
@MathewNWSH are you able to export this as a list of items in JSON?
I can't load it with
json.load()
right now..
Yup, I've just loaded it into the pgstac instance using:
pypgstac load items https://s3.fra1-2.cloudferro.com/swift/v1/poland-stac/poland-data.json
If you prefer, you can get it in the form of item list using: https://pgstac.demo.cloudferro.com/collections/sentinel-2-l2a/items?limit=1000 and then using https://pgstac.demo.cloudferro.com/collections/sentinel-2-l2a/items?limit=1000&token=next:sentinel-2-l2a:S2A_MSIL2A_20231123T095321_N0509_R079_T34UCC_20231123T141252 move to another page, and so on.
for 1000 items of s2l2a it takes 1.67 min to reply ;/ quite a lot but on the other hand this collection is the richest in metadata among all sentinels / processing levels
No worries :D We should have specified we weren't expecting it for pypgstac and just pystac, but I can work with it knowing it's line delimited, thanks!
for 1000 items of s2l2a it takes 1.67 min to reply ;/
That is quite slow. The same query for sentinel-2-l2a took 4 seconds (without caching) on my pgstac-based STAC API with a total of 11 million scenes within this collection.
for 1000 items of s2l2a it takes 1.67 min to reply ;/
That is quite slow. The same query for sentinel-2-l2a took 4 seconds (without caching) on my pgstac-based STAC API with a total of 11 million scenes within this collection.
the demo is basing on: https://github.com/stac-utils/stac-fastapi-pgstac/blob/main/docker-compose.yml deployed via docker compose up. Soon we will move to bare metal server (I'm waiting for postgres 17 release)
this is the sample item of S2L2a, as you can see it's quite huge for a single item: S2B_MSIL2A_20240110T100309_N0510_R122_T33UVR_20240110T113053.json
I was blaming compose deployment and huge size of an item for performance problems.
Is there any option to exchange experience from using stac-fastApi-pgstac with you? Or general recommendations while working with pgstac/stac-fastAPI-pgstac?
this is the sample item of S2L2a, as you can see it's quite huge for a single item
Yes, this is quite huge. Our STAC item (generated with stactools-sentinel2
package) is smaller (we deleted downsampled assets):
https://stac.terrabyte.lrz.de/public/api/collections/sentinel-2-c1-l2a/items/S2B_MSIL2A_20240110T100309_N0510_R122_T33UVR_20240110T113053
Is there any option to exchange experience from using stac-fastApi-pgstac with you? Or general recommendations while working with pgstac/stac-fastAPI-pgstac?
Sure. We also have an issue on working on performance improvements and best practice guidelines within EOEPCA (https://github.com/EOEPCA/resource-discovery/issues/23). I just commented there.
@MathewNWSH Please feel free to contact me via mail as well: jonas.eberle@dlr.de
For testing the coverages API etc, we could use some more sample data in our eoAPI dev catalog.
A good example would be multi-spectral data, ideally Sentinel-2 L2A.
Since the EOEPCA+ Kubernetes cluster is deployed in CreoDIAS, perhaps we could even load assets directly from the Sentinel-2 collection in CreoDIAS' S3?
They have a STAC catalog: https://pgstac.demo.cloudferro.com/collections/sentinel-2-l2a/items
It keeps the cloud-loadable asset hrefs under "alternate assets", which TiTiler-PgSTAC currently does not support.
But we could probably work around that - we will probably want to copy the STAC items to our catalog anyways.
Either way, a small subset of Sentinel-2 L2A scenes would be great to include. It can be regionally and temporally limited, perhaps even just 2x2 MGRS tiles for a year or so.
Acceptance criteria