NASA-IMPACT / veda-backend

Backend services for VEDA
Other
11 stars 5 forks source link

Update datetime summaries and extents for items with datetime ranges #156

Closed anayeaye closed 1 year ago

anayeaye commented 1 year ago

What

Collection temporal extent and the summaries are based on the nominal datetime of an item which defaults to the start_dateime of an item record in pgstac. Methods that summarize the collection temporal extent and datetime instant ranges should use the end_datetime in item records to describe the full range of dates. This issue is to make changes to the user defined functions used to create summaries and extents, an additional ticket will be opened in the veda-stac-ingestor to implement the update and make any changes to the ingest API if needed.

Where to change

The SQL to summarize collection item datetime extents is in the database bootstrapper/handler. This should be updated and redeployed by incrementing the VEDA DB SCHEMA VERSION to trigger the bootstrapper.

Example SQL that demonstrates the problem

Disregard the datetime instant timezone offsets, that is a different bug being tracked in pipelines

set timezone TO 'UTC';
select 
    items.collection, 
    min(items.datetime) min_datetime,
    max(items.datetime) max_datetime,
    max(items.end_datetime) true_max_datetime,
    collections."content"->'summaries'->'datetime' summaries, 
    collections."content"->'extent'->'temporal'->'interval'->0 extent
from items 
join collections on collections.id=items.collection
where items.collection like '%houston%'
group by items.collection, collections."content";
collection min_datetime max_datetime true_max_datetime summaries extent
houston-landcover 2000-12-31 17:00:00.000 -0700 2018-12-31 17:00:00.000 -0700 2019-12-30 17:00:00.000 -0700 ["2001-01-01T00:00:00Z", "2004-01-01T00:00:00Z", "2006-01-01T00:00:00Z", "2008-01-01T00:00:00Z", "2011-01-01T00:00:00Z", "2013-01-01T00:00:00Z", "2016-01-01T00:00:00Z", "2019-01-01T00:00:00Z"] ["2001-01-01 00:00:00+00", "2019-01-01 00:00:00+00"]
houston-lst-day 1999-12-31 17:00:00.000 -0700 2009-12-31 17:00:00.000 -0700 2018-12-31 17:00:00.000 -0700 ["2000-01-01T00:00:00Z", "2010-01-01T00:00:00Z"] ["2000-01-01 00:00:00+00", "2010-01-01 00:00:00+00"]
houston-lst-night 1999-12-31 17:00:00.000 -0700 2009-12-31 17:00:00.000 -0700 2018-12-31 17:00:00.000 -0700 ["2000-01-01T00:00:00Z", "2010-01-01T00:00:00Z"] ["2000-01-01 00:00:00+00", "2010-01-01 00:00:00+00"]
houston-ndvi 1999-12-31 17:00:00.000 -0700 2009-12-31 17:00:00.000 -0700 2018-12-31 17:00:00.000 -0700 ["2000-01-01T00:00:00Z", "2010-01-01T00:00:00Z"] ["2000-01-01 00:00:00+00", "2010-01-01 00:00:00+00"]

AC

Not AC

slesaad commented 1 year ago

Two more datasets have the same problem now:

  1. lis-global-da-gpp-trend
  2. lis-global-da-tws-trend

And since we have more ingests coming up, can we prioritize this? @anayeaye

anayeaye commented 1 year ago

The datetime extents are extended to the max end_datetime in pgstac v0.7.0 so upgrading in #154 would fix the inbuilt functions used to update collection extents. The summaries methods may still need to be updated if they do not use the pgstac collection_temporal_extent.

j08lue commented 1 year ago

Following db upgrade https://github.com/NASA-IMPACT/veda-architecture/issues/249