Closed abarciauskas-bgse closed 2 years ago
@abarciauskas-bgse the link to the sample file says that it's been deleted. do you have it by any chance?
@slesaad thanks for looking at this, link is updated
@sahmad3 is taking over the story development from the EIS Freshwater team side. He sent some additional details via email that I am adding here:
Visualizing the LIS data is what we are focusing on, but at the global scale. The model outputs are ready for around 20 years. The [rest of the NetCDF files] are on discover and Brendan copied them over to SMCE S3 bucket (eis-dh-hydro/LIS_NETCDF/DA_10km/GLOBAL/SURFACEMODEL/*/LIS_HIST.nc)
Waiting on the rest of the data to be processed for the anomaly dataset
As discussed in sprint planning today, @slesaad will check that what can be published has been published and we will create a new ticket for issues (empty data mentioned by @anayeaye ) with other LIS COGs
The datasets are published to the STAC API. The task that's remaining is specified in #192 , so this is being closed!
NOTE: The dataset ingest + publication workflows are currently undergoing a refactor in this branch: https://github.com/NASA-IMPACT/cloud-optimized-data-pipelines/tree/refactor
Brendan McAndrew is one of the science leads on the freshwater team. We will be helping the freshwater team convert and publish this dataset to COGs so that the freshwater story on Midwest Flooding can be told in the new climate dashboard.
1. Identify the dataset and what the processing needs are
Brendan McAndrew shared a sample NetCDF file: https://drive.google.com/file/d/1i8-hEa2jl4E36TK78fIMUMvh_RHKOR-T/view?usp=sharing which we can use to test COG conversion.
More from Brendan:
2. Create COG conversion code and verify the COG output with a data product expert (for example, someone at the DAAC which hosts the native format) by sharing in a visual interface.
Design the metadata and publish to the Dev API
A collection will need the following fields, some of which may be self-evident through the filename or an about page for the product, however there are many cases in which we may need to reach out to product owners to define the right values for these fields:
After reviewing the STAC documentation for collections and items and reviewing existing scripts for generating collection metadata (generally with SQL) and item metadata, generate or reuse scripts for your collection and a few items to publish to the testing API. There is some documentation and examples for how to generate a pipeline or otherwise document your dataset workflow in https://github.com/NASA-IMPACT/cloud-optimized-data-pipelines. We would like to maintain the scripts folks are using to publish datasets in that repo so we can easily re-run those datasets ingest and publish workflows if necessary.
If necessary, request access and credentials to the dev database and ingest and publish to the Dev API. Submit a PR with the manual or CDK scripts used to run the workflow to publish to the Dev API and include links to the published datasets in the Dev API
Publish to the Staging API
Once the PR is approved, we can merge and publish those datasets to the Staging API