cedadev / search-futures

Future Search Architecture
BSD 2-Clause "Simplified" License
0 stars 0 forks source link

Review PySTAC for required functionality #129

Closed agstephens closed 2 years ago

agstephens commented 2 years ago

E.g. Can it search/filter?

Mahir-Sparkess commented 2 years ago

https://github.com/cedadev/pystac-client/tree/asset-search

minor changes to stac-fastapi-elasticsearch to be PySTAC compliant:

Mahir-Sparkess commented 2 years ago

PySTAC follows the datetime requirement of the STAC spec however certain ESGF datasets/files like CMIP6.CMIP.CAS.FGOALS-g3.historical.r1i1p1f1.fx.orog.gn.v20201202.orog_fx_FGOALS-g3_historical_r1i1p1f1_gn.nc_0 do not have datetime. This throws an error when PySTAC tries to convert the retrieved item into an PySTAC Item object.

For now, I am avoiding querying such items but it will cause problems.

edit: Just to compare with an ESGF file that does have datetime: CMIP6.CMIP.CAS.FGOALS-g3.historical.r2i1p1f1.Omon.tos.gn.v20191126.tos_Omon_FGOALS-g3_historical_r2i1p1f1_gn_185001-201412.nc

gap736uk commented 2 years ago

Doesn't STAC permit you to give a null entry in the index for a missing field?

Also, be mindful that we have data that are from pre 1AD where Python datetime module hits a wall... this is an issue generally with older data and affects things like Django too. There are ways around this on a technical front (see astropy for example- I've a MOLES ticket around this issue), but there is a more fundamental question, which I've discussed with NGDC, about how to mark up these datasets covering pre 1AD periods... e.g. when discussing geological timeframes [the answer has been to switch over to using controlled vocabs to mark up the time span of the data in terms of eons, eras and epochs].

Mahir-Sparkess commented 2 years ago

Thanks, that seems to work, I just set start_datetime and end_datetime to None on the API end with a check. This might fall under the ESGF indexing where I might want to go through and add null datetimes to items with missing datetime properties or keep it to set it to null at the API end if missing.

gap736uk commented 2 years ago

I think we've discussed this in the past, but IIRC we agreed that where STAC required a field we would have some sensible defaults to use to ensure we 'filled the box' - even if the entry was null.... then if we have content from the scan that we can put into said field then we would use that instead.

agstephens commented 2 years ago

@Mahir-Sparkess, these files in CMIP, known as "fx" files - are time-invariant fields. For example, a definition of the land vs sea grid points. These don't change over time so you can define them without the time field.

We should set the datetime fields with an appropriate null field - if there is a STAC/JSON standard for such things.

Mahir-Sparkess commented 2 years ago

Update: awaiting to redeploy production STAC so the free-text search can be pushed to PySTAC with required tests.