earthpulse / eotdl

Earth Observation Training Datasets
https://eotdl.com
MIT License
17 stars 6 forks source link

Explore options for financiation of improvements of the pyStac for other possible projects #199

Open tatimars opened 1 week ago

tatimars commented 1 week ago

To have in mind: this e-mail thread relevant for this pyStac issue:

_"Concerning STAC-FastAPI, we have it on our radar. One of the tasks in the evolution roadmap is to provide a STAC-API. Thomas from Brockman told us they already have one ready that we can interface, so that is the default choice. However, if their solution is not satisfactory, using STAC-FastAPI was our backup plan.

Concerning the perfomance issues, I believe that you are referring to performance when serving the STAC metadata. Our main bottleneck, however, is on creating the STAC metadata in the first place. So, going from no metadata at all (only a list of files, locally or remotely) to STAC metadata that can then be ingested in geodb and served later. This process involves several steps where the pystac library generates the metadata, add extensions and validate that the output is correct (which is really slow). I hope this makes it more clear.

As I mentioned in past meetings, the solution from Radiant Earth (and EOX also mentioned to do it) is to NOT using the official pystac tooling and generate the json files by hand, which requires a deep understanding of STAC in order to populate the correct fields. I also presume they are bypassing the validation step and fixing conflicts as they arise. Our proposal would be to do the same or, what I believe would be more impactful, solve this problem by improving pystac or offering an alternative library (funded, as you suggested, through the EOEPCA project)."_