NASA-IMPACT / veda-data-pipelines

data transformation - ingestion - publication pipelines to support VEDA
Other
12 stars 6 forks source link

Add Global Relative Deprivation Index (GRDI) to STAC API #100

Closed abarciauskas-bgse closed 2 years ago

abarciauskas-bgse commented 2 years ago

Files were shared by Juan F. Martinez jmartine@ciesin.columbia.edu https://drive.google.com/drive/folders/1DLiRaRqyNyqJjJs5t94BpY8OLB6bQIwz

Details on the dataset are here: https://drive.google.com/file/d/1U20_4rqAw_tqzI7AI_FWpI0rCiL5KLQ9/view?usp=sharing

ingalls commented 2 years ago

Delivered data has been uploaded to:

s3://climatedashboard-data/delivery/GRDI/2022-04-17.zip
ingalls commented 2 years ago

Initial COGS can be found:

s3://climatedashboard-data/default/
abarciauskas-bgse commented 2 years ago

@ingalls probably those should go in s3://climatedashboard-data/global_relative_deprivation_index/ (or whatever collection name you think is appropriate as a STAC ID)

xhagrg commented 2 years ago

what are the dates for these files, @abarciauskas-bgse ?

abarciauskas-bgse commented 2 years ago

As noted in slack, the dates should be in Table 1 of https://docs.google.com/document/d/1Taq1UIYd5NgyR8Q8X6utinEqF4ysqfzb/edit

And as also discussed on slack, each index will be a separate collection (similar to SVI)

abarciauskas-bgse commented 2 years ago

Juan got back to us with this information:

The GRDIv1 data is still in the Alpha phase so the information you're showing me is outdated. We will be releasing the data soon through the SEDAC website. Please view the updated information in this shared drive: https://drive.google.com/drive/folders/1DLiRaRqyNyqJjJs5t94BpY8OLB6bQIwz?usp=sharing

To answer your question, most of them are between 2010-2021. We used a wide variety of input datasets with different dates. The Open Street Map buildings dataset updates regularly, so it's hard to put a date on it, but we accessed it in 2021. Please see below and let me know if I can help with anything else:

povmap-grdi-v1.tif - Global Gridded Relative Deprivation Index (GRDI), v1 raster. 2010-2021 povmap-grdi-v1_BUILT_index.tif - BUILT Constituent raster, indexed 0 to 100. 2015-2021 povmap-grdi-v1_CDR.tif - CDR Constituent raster. 2010 povmap-grdi-v1_IMR.tif - IMR Constituent raster. 2015 povmap-grdi-v1_SHDI.tif - SHDI Constituent raster. 2018 povmap-grdi-v1_VNL-2020.tif - VNL 2020 Constituent raster. 2020 povmap-grdi-v1_VNL-slope.tif - VNL Slope Constituent raster. 2012-2020 povmap-grdi-v1_FilledMissingValues-Count.tif - Raster showing count of constituent inputs that were filled in per cell using the Fill Missing Values tool. 2010-2021

@xhagrg apologies you will have to re-upload these files to S3

xhagrg commented 2 years ago

@abarciauskas-bgse what values do we use for multi-year date ranges in dashboard:time_density?

abarciauskas-bgse commented 2 years ago

@anayeaye what do you think?

anayeaye commented 2 years ago

It looks like these would be time_density=null which will in form the front end not to make a time picker

xhagrg commented 2 years ago

@ingalls @anayeaye @abarciauskas-bgse

https://dev-stac.delta-backend.xyz/collections/grdi-v1-raster/items https://dev-stac.delta-backend.xyz/collections/grdi-v1-built/items https://dev-stac.delta-backend.xyz/collections/grdi-imr-raster/items https://dev-stac.delta-backend.xyz/collections/grdi-shdi-raster/items https://dev-stac.delta-backend.xyz/collections/grdi-vnl-raster/items https://dev-stac.delta-backend.xyz/collections/grdi-vnl-slope-raster/items https://dev-stac.delta-backend.xyz/collections/grdi-cdr-raster/items https://dev-stac.delta-backend.xyz/collections/grdi-filled-missing-values-count/items

abarciauskas-bgse commented 2 years ago

LGTM @xhagrg thank you! 🙌🏽

abarciauskas-bgse commented 2 years ago

@anayeaye what do you typically look for when reviewing collection and item metadata? I was just looking to see the naming looks reasonable, since I know that is important for identifying the different indexes for client applications. I would also check the dates and spatial extents look reasonable, but trust that the create_stac_item call generates the right statistics and other values.

anayeaye commented 2 years ago

@abarciauskas-bgse @xhagrg RE QC: I was comparing the fields to our Collection and Item specs for a while but that's not sustainable. Maybe the formal QC process could be a copy-edit read over the collection and an item and then confirming that summaries can be created?

Since these GRDI collections are all cog_default collections, I QC'd by running the summary function in postgres (SELECT dashboard.update_all_default_summaries();) and then checked that the summary objects looked reasonable--that shows that that the collection has item_assets and dashboard properties and that the datetimes for the items are properly ingested.

These all look good to me but there is one collection popped out here and I don't see why yet. grdi-v1-raster has an item that is found by the stac-api that looks OK to me but the collection summary function failed. The items search works fine so I am stumped at the moment https://dev-stac.delta-backend.xyz/collections/grdi-v1-raster/items.

abarciauskas-bgse commented 2 years ago

Thanks @anayeaye - is it worth adding a ticket for the failure of the summary function for grdi-v1-raster? Sounds like otherwise these collections are good to go for now and we should consider documenting steps for manual QA and also which of those steps can be automated to be done as part of our automated data pipelines.

xhagrg commented 2 years ago

I will retry re-ingesting the collections and we can check again.

xhagrg commented 2 years ago

I re-ingested the data / collection details for grdi-v1-raster.

anayeaye commented 2 years ago

After trying the update again (didn't produce summaries), I ran the summary SQL and see that grdi-v1-raster should have a summary that looks like this so it looks like the problem might be in the update_default_summaries UDF.

{
  "summaries": {
    "datetime": [
      "2010-01-01T00:00:00Z"
    ],
    "cog_default": {
      "max": 99.66927337646484,
      "min": 1.1262484788894653
    }
  }
}
anayeaye commented 2 years ago

^ Weird, now it works 2 minutes later. All GRDI collections now have default summaries. Things look good!

abarciauskas-bgse commented 2 years ago

Believe this is also done