NASA-IMPACT / veda-data-pipelines

data transformation - ingestion - publication pipelines to support VEDA
Other
12 stars 6 forks source link

Publish and Configure EPA emissions datasets #155

Closed abarciauskas-bgse closed 2 years ago

abarciauskas-bgse commented 2 years ago

Acceptance criteria

Steps

Identify the dataset and where it will be accessed from.

s3://veda-data-store-staging/EIS/cog/EPA-inventory-2012

Design the metadata and publish to the Dev API

  1. Add collection metadata
  2. Run the cloud-optimized-data-pipelines to publish to DEV
  3. Open PR for collection metadata + to assert metadata in DEV DB looks OK

Publish to the Staging API

Once the PR is approved, we can merge and publish those datasets to the Staging API

abarciauskas-bgse commented 2 years ago

@slesaad @xhagrg could you review this ticket and update it with any additional details about running the pipeline?

slesaad commented 2 years ago

This dataset is selected to be tested with the new workflows API that's in development; so it's blocked until then

slesaad commented 2 years ago

Based on the files available in the veda-data-store-staging bucket, see:

% aws s3 ls s3://veda-data-store-staging/EIS/cog/EPA-inventory-2012/ --summarize
                           PRE annual/
                           PRE daily/
                           PRE monthly/

Total Objects: 0
   Total Size: 0
(geo-datasets) alexandrakirk@Alexandras-MacBook-Pro-2 Downloads % aws s3 ls s3://veda-data-store-staging/EIS/cog/EPA-inventory-2012/annual/ --summarize
2022-07-08 14:29:53     520801 EPA-annual-emissions_1A_Combustion_Mobile.tif
2022-07-08 14:29:54     523002 EPA-annual-emissions_1A_Combustion_Stationary.tif
2022-07-08 14:29:54      27863 EPA-annual-emissions_1B1a_Abandoned_Coal.tif
2022-07-08 14:29:54      18895 EPA-annual-emissions_1B1a_Coal_Mining_Surface.tif
2022-07-08 14:29:54      15428 EPA-annual-emissions_1B1a_Coal_Mining_Underground.tif
2022-07-08 14:29:54     127662 EPA-annual-emissions_1B2a_Petroleum.tif
2022-07-08 14:29:54     520416 EPA-annual-emissions_1B2b_Natural_Gas_Distribution.tif
2022-07-08 14:29:54      41289 EPA-annual-emissions_1B2b_Natural_Gas_Processing.tif
2022-07-08 14:29:54     123855 EPA-annual-emissions_1B2b_Natural_Gas_Production.tif
2022-07-08 14:29:54     252448 EPA-annual-emissions_1B2b_Natural_Gas_Transmission.tif
2022-07-08 14:29:55      10507 EPA-annual-emissions_2B5_Petrochemical_Production.tif
2022-07-08 14:29:55       7944 EPA-annual-emissions_2C2_Ferroalloy_Production.tif
2022-07-08 14:29:55     490541 EPA-annual-emissions_4A_Enteric_Fermentation.tif
2022-07-08 14:29:55     501517 EPA-annual-emissions_4B_Manure_Management.tif
2022-07-08 14:29:55      26370 EPA-annual-emissions_4C_Rice_Cultivation.tif
2022-07-08 14:29:55     179259 EPA-annual-emissions_4F_Field_Burning.tif
2022-07-08 14:29:55     209251 EPA-annual-emissions_5_Forest_Fires.tif
2022-07-08 14:29:55      69045 EPA-annual-emissions_6A_Landfills_Industrial.tif
2022-07-08 14:29:55      95705 EPA-annual-emissions_6A_Landfills_Municipal.tif
2022-07-08 14:29:56     186311 EPA-annual-emissions_6B_Wastewater_Treatment_Domestic.tif
2022-07-08 14:29:55      80227 EPA-annual-emissions_6B_Wastewater_Treatment_Industrial.tif
2022-07-08 14:29:56      89020 EPA-annual-emissions_6D_Composting.tif

% aws s3 ls s3://veda-data-store-staging/EIS/cog/EPA-inventory-2012/monthly/ --summarize      
                           PRE emissions_1A_Combustion_Stationary/
                           PRE emissions_1B2a_Petroleum/
                           PRE emissions_1B2b_Natural_Gas_Production/
                           PRE emissions_4B_Manure_Management/
                           PRE emissions_4C_Rice_Cultivation/
                           PRE emissions_4F_Field_Burning/

Total Objects: 0
   Total Size: 0
(geo-datasets) alexandrakirk@Alexandras-MacBook-Pro-2 Downloads % aws s3 ls s3://veda-data-store-staging/EIS/cog/EPA-inventory-2012/daily/ --summarize
                           PRE emissions_5_Forest_Fires/

Total Objects: 0
   Total Size: 0

it was decided that there would be multiple collections:

  1. annual: 22 collections
  2. monthly: 6 collections
  3. daily: 1 collection

So a total of 29 collections.

Sent an email to the science team asking for information about the dataset (title and description), waiting on those.

A sample collection has been published to dev: https://dev-stac.delta-backend.com/collections/EPA-annual-emissions_1A_Combustion_Mobile

All the collection jsons/step-function-inputs have been set up already. Just waiting on those information from the science team to kick off the ingests.

aboydnw commented 2 years ago

@aboydnw to reach out to Kevin and Lesley for EPA dataset overview content

slesaad commented 2 years ago

All the datasets are now available at https://dev-stac.delta-backend.com/, there are a total of 29 collections. This is to be out new staging stac api.

danielfdsilva commented 2 years ago

@aboydnw Seems that these items would go under the EIS thematic area, or should it be a different one?

Also, some seem related so I'd group them as follows:

EPA-annual-emissions_4B_Manure_Management
EPA-monthly-emissions_4B_Manure_Management

EPA-annual-emissions_1B2b_Natural_Gas_Processing
EPA-annual-emissions_1B2b_Natural_Gas_Production
EPA-monthly-emissions_1B2b_Natural_Gas_Production
EPA-annual-emissions_1B2b_Natural_Gas_Transmission
EPA-annual-emissions_1B2b_Natural_Gas_Distribution

EPA-annual-emissions_1B1a_Coal_Mining_Underground
EPA-annual-emissions_1B1a_Coal_Mining_Surface

EPA-annual-emissions_1A_Combustion_Mobile
EPA-annual-emissions_1A_Combustion_Stationary
EPA-monthly-emissions_1A_Combustion_Stationary

EPA-annual-emissions_6B_Wastewater_Treatment_Domestic
EPA-annual-emissions_6B_Wastewater_Treatment_Industrial

EPA-annual-emissions_6A_Landfills_Industrial
EPA-annual-emissions_6A_Landfills_Municipal

EPA-annual-emissions_4C_Rice_Cultivation
EPA-monthly-emissions_4C_Rice_Cultivation

EPA-annual-emissions_4F_Field_Burning
EPA-monthly-emissions_4F_Field_Burning

EPA-annual-emissions_1B2a_Petroleum
EPA-monthly-emissions_1B2a_Petroleum

EPA-annual-emissions_5_Forest_Fires
EPA-annual-emissions_5_Forest_Fire
EPA-daily-emissions_5_Forest_Fires

@slesaad both EPA-annual-emissions_5_Forest_Fires and EPA-annual-emissions_5_Forest_Fire exist. Is this intended?

The are alone and each would its own dataset.

EPA-annual-emissions_6D_Composting

EPA-annual-emissions_1B1a_Abandoned_Coal

EPA-annual-emissions_2B5_Petrochemical_Production

EPA-annual-emissions_2C2_Ferroalloy_Production

EPA-annual-emissions_4A_Enteric_Fermentation

Do we have information about what color scale and rescale parameters we should be using?

slesaad commented 2 years ago

@danielfdsilva yeah, they do go under EIS thematic area. The two forest fires collections were not intended; sorry, my bad 😅 I'll delete EPA-annual-emissions_5_Forest_Fire, you can use the other one for the configuration.

@aboydnw was in conversation with the science team about the content for configuration, maybe the color scale/rescale information should come from them too?

aboydnw commented 2 years ago

ah @slesaad @danielfdsilva I forgot about the color scale and rescale parameters. Would it be easiest for us to just have a meeting with the GHG scientists, since we don't have the trilateral dashboard to go off of this time?

slesaad commented 2 years ago

@danielfdsilva , could you use https://5vr61n5bo5.execute-api.us-west-2.amazonaws.com/ for the stac api url instead, that's actually the new to-be staging api. (Just for testing, definitely don't want to merge this url.)

aboydnw commented 2 years ago

@danielfdsilva does this page give enough guidance on color scale? https://www.epa.gov/ghgemissions/gridded-2012-methane-emissions#data

danielfdsilva commented 2 years ago

@aboydnw The scale looks simple and is always the same. I'd say we can try and see

slesaad commented 2 years ago

This is done and appears in the dashboard: https://www.earthdata.nasa.gov/dashboard/eis/datasets