Downscaled CMIP 6 dataset

The CMIP 6 dataset is a collection of different climate models predicting the values of various climate related variables over the years 2015-2100 in a set of different scenarios (best case, worst case, most likely, etc). The model uses the years 1950 to 2014 as a baseline.

The researchers at AMES have been working on a downscaled version of the dataset in order to get a set of variables/models that all have the same geographic and temporal extent. (I'll call this the Downscaled CMIP6).

The Downscaled CMIP6 is 35 models that will be released mid-October (11 are currently available), that comprise 9 different variables (+/- 2), for 2 of the scenarios ( ssp245 and ssp585 ). The dataset has worldwide (land only) coverage with 0.25 x 0.25 degree resolution, with daily values.

The structure of the dataset is roughly as follows:

GIS-E2-1-G/                 # model name
  historical/                 # baseline, years 1950-2014
    hurs/                       # variable 1
      <FILE_ID>_1950.nc           
      <FILE_ID>_1951.nc        
      ...
  ssp245/                     # scenario 1, years 2015-2100
  ssp585/                     # scenario 2, years 2015-2100
    hurs/                       # variable 1
      <FILE_ID>_2015.nc           
      <FILE_ID>_2015.nc.          
      ...

In terms of getting this into the Dashboard, I have a couple of thoughts:

The baseline vs scenario comparison seems well adapted to the slider functionality
There are 35 different models, predicting 9 different variables over 2 different scenarios which makes potentially 630 COGs for any given date - obviously way too much. Each model aims to predict the same variable, so it is valid to average the 35 models (according to Andrew from NASA Ames). This gets the number down to 18 COGs for each date (9 variables over 2 different scenarios). This is quite a few, but manageable, especially if we can stick this dataset into its own dashboard for the time being
- The point of the data is to be able to compare the values of the different models in between themselves so eventually it would be very interesting to come up with a feature that allows users toggle the various models on and off - but that is out the scope for the time being.
- Since the 18 different COGs for each date related to 9 variables over 2 scenarios, I think it would be interesting to indicate the hierarchical nature of the COGs in the dashboard. As the dashboard is currently configured we would have to display something like:
```
VAR_1_SCENARIO_1
VAR_1_SCENARIO_2
VAR_2_SCENARIO_1
VAR_2_SCENARIO_2
```
- ... which I don't think is very intuitive (as opposed to something where you select the variable and then toggle between the two scenarios). Again, if this is too tight of a deadline to come up with a design, we'll stick to what we've got.
We can process and ingest the data as it becomes available (ie: we don't need to have all 35 models available by the mid-Oct. deadline.
The logic to open a NetCDF containing 365 days worth of data, slice it up into individual days and then produce a COG from each is fairly straightforward, but I think it will be a very time/memory/processing intensive process, so we will likely need to play around with different processing strategies (ECS vs AWS Batch vs Lambda, etc).

ACTION ITEMS:

We need to produce LOEs for:

Spinning up a new dashboard instance that doesn't mention "covid-19" anywhere @olafveerman @danielfdsilva
Dataset processing/ingestion @leothomas @abarciauskas-bgse
Each of the following UI options (both design and implementation): @ricardoduplos @danielfdsilva
- Lowest Effort: Take the mean value of the 35 models for each scenario, integrate them to the dashboard as is, keeping a flat structure for the datasets (present the 2 scenarios for the 9 variables as if they were 18 different datasets).
- Medium Effort: Take the mean value of the 35 models for each scenario, present the 9 different variables as 9 different datasets and allow the user to toggle between the two scenarios in the dashboard (this exposes 1 hierarchical level of variable --> scenario)
- Highest Effort: Display the 9 variables, allow the user to toggle between the 2 scenarios and toggle between each of the models (including an average of the values from all of the models).
- Even higher than highest effort: Dynamically correlate 2 or more of the models, displaying variance in the dashboard (according to Andrew when comparing models it's valuable to understand where the models align and where they have discrepancies, since different methods/training data are used).

NASA-IMPACT / covid-api

Downscaled CMIP 6 dataset #142

ACTION ITEMS: