Closed rplzzz closed 4 years ago
@kdorheim What is the story with the units
column in the newest version of cmip_individual
? It looks bogus. Any reason not to drop it before saving into the package data?
@rplzzz Hi I am in the Denver air port at the moment, I'll take a look at this when I get home. Can we meet tomorrow to talk about how to divide up the tasks you need for the mcmc?
Ok, the package data for cmip_individual
and esm_comparison
doesn't match what is produced by data-raw/ESM_processing_code/C.esm-comparison-data.R
. The most puzzling thing about it is that one of the discrepancies is that cmip_individual
has a "units" column:
> names(hectorcal::cmip_individual)
[1] "year" "units" "model" "variable" "experiment" "ensemble"
[7] "value" "esmbaseline" "concbaseline"
However, there has never been a "units" column in the input data (the units string is read in in a column called "unit"), and in any case we've been dropping the unit column since at least January: https://github.com/kdorheim/hectorcal/blame/master/data-raw/ESM_processing_code/C.esm-comparison-data.R#L173-L177
At this point I think I'm going to rerun the data scripts and update the package data to use whatever comes out. @kdorheim, if you have time, could you compare the output of the data scripts for these two datasets to what you have on your local system and see if they are identical? If so, then we can write this off as a git mixup.
@rplzzz Can do what is your time frame for this so I can triage this task accordingly?
At your convenience. I'm going to proceed under the assumption that the scripts in data-raw are producing the correct data. If you find that's not the case, let me know.
Can do and will do! I should have time to look into this the second half of this week.
Ok, I guess I'm confused here. What I thought I remembered was that when we went out and got the additional CMIP data, we found that some of the new models we added were missing CO2 data toward the end of the historical period. However, when I take the intersection of all of the available years by scenario and variable, I'm not seeing any missing data.
Update: it was one model. Found the discussion here: https://github.com/kdorheim/hectorcal/pull/26#issuecomment-530354455 The MRI-ESM1 model doesn't appear to have made it into the new package data:
> filter(cmip_individual, experiment == 'esmHistorical', model=='MRI-ESM1')
# A tibble: 0 x 8
# … with 8 variables: year <dbl>, model <chr>, ensemble <chr>, variable <chr>, experiment <chr>,
# value <dbl>, esmbaseline <dbl>, concbaseline <dbl>
@kdorheim If you have a chance, could you make sure that the package data contains all of the new models you downloaded? You can add it to the draft PR that I have open.
@rplzzz can do! Can you share the version of the the files that you have on your local? And the date they are from?
I'm working from commit 18916e1.
The last change to the ESM data appears to have been on 18 Oct:
commit 29d4cca
Author: Kalyn <kalyn.dorheim@pnnl.gov>
Date: Fri Oct 18 20:32:58 2019 -0400
clean up
M data-raw/CMIP5_annual_global_average.csv
If there are some particular files you need a status for, I can give you the lowdown on those as well. I'm working from the versions checked into the repository, so you should also be able to find some information in the logs.
UPDATE: after looking into this I have uncovered some errors in my ESM_processing_code 🙄
This is turning out to be quite the nightmare, I think whenever I do this sort of this I am going to implement drake
Sorry to hear that; I've been there more than a few times myself. Let me know if there is anything I can do to help.
Okay we have some how both lost and gained data 🤔
Here is a kable of the data we "lost" in some cases we only lost data in a few years.
model | variable | experiment | years_missing |
---|---|---|---|
CCSM4 | heatflux | historical | 7 |
CESM1-BGC | tas | rcp45 | 95 |
GFDL-CM3 | tas | rcp60 | 90 |
GFDL-ESM2G | co2 | rcp60 | 90 |
GFDL-ESM2G | tas | rcp60 | 90 |
GISS-E2-H | tas | rcp60 | 50 |
GISS-E2-R | tas | rcp60 | 75 |
HadCM3 | tas | rcp45 | 30 |
HadGEM2-AO | tas | historical | 146 |
HadGEM2-AO | tas | rcp26 | 95 |
HadGEM2-AO | tas | rcp45 | 95 |
HadGEM2-AO | tas | rcp60 | 95 |
HadGEM2-AO | tas | rcp85 | 95 |
MIROC-ESM-CHEM | heatflux | rcp45 | 1 |
MRI-CGCM3 | heatflux | historical | 5 |
NorESM1-ME | tas | rcp60 | 50 |
While here is a kable of the data we gained.
model | variable | experiment |
---|---|---|
CanESM2 | heatflux | esmHistorical |
CanESM2 | heatflux | esmrcp85 |
CESM1-BGC | heatflux | esmHistorical |
CESM1-BGC | heatflux | esmrcp85 |
CNRM-CM5 | tas | historical |
CNRM-CM5 | tas | rcp85 |
CSIRO-Mk3-6-0 | tas | historical |
CSIRO-Mk3-6-0 | tas | rcp26 |
CSIRO-Mk3-6-0 | tas | rcp45 |
CSIRO-Mk3-6-0 | tas | rcp60 |
CSIRO-Mk3-6-0 | tas | rcp85 |
CSIRO-Mk3L-1-2 | heatflux | historical |
EC-EARTH | tas | rcp26 |
EC-EARTH | tas | rcp45 |
EC-EARTH | tas | rcp85 |
MIROC-ESM | co2 | esmHistorical |
MIROC-ESM | co2 | esmrcp85 |
MIROC-ESM | heatflux | esmHistorical |
MIROC-ESM | heatflux | esmrcp85 |
MIROC-ESM | tas | esmHistorical |
MIROC-ESM | tas | esmrcp85 |
MPI-ESM-LR | co2 | esmHistorical |
MPI-ESM-LR | co2 | esmrcp85 |
MPI-ESM-LR | tas | esmHistorical |
MPI-ESM-LR | tas | esmrcp85 |
MRI-ESM1 | co2 | esmHistorical |
MRI-ESM1 | co2 | esmrcp85 |
MRI-ESM1 | heatflux | esmHistorical |
MRI-ESM1 | heatflux | esmrcp85 |
MRI-ESM1 | tas | esmHistorical |
MRI-ESM1 | tas | esmrcp85 |
NorESM1-ME | co2 | esmHistorical |
NorESM1-ME | co2 | esmrcp85 |
NorESM1-ME | heatflux | esmHistorical |
NorESM1-ME | heatflux | esmrcp85 |
NorESM1-ME | tas | esmHistorical |
NorESM1-ME | tas | esmrcp85 |
Okay after much confusion I think I have arrived at our final data.
The comparison data I have now does not change the concentration max and min values but will change the concentration driven mean/median. The comparison data committed in eae1d2a
did not have na.rm = TRUE
when calculating the multi-model mean and median and so and which caused problems with the mean temperature data
See branch krd_issue27
Now we have emissions driven data from the following models
"CanESM2" "CESM1-BGC" "GFDL-ESM2M" "MRI-ESM1" "NorESM1-ME" "GFDL-ESM2G" "MIROC-ESM"
So instead of only 3 models we have a total of 7 models!
If we compare what the old vs new emissions values are we see that the max values do not change as much as the min values especially for co2, heatflux, and tas.
See branch krd_issue27
Currently that branch is showing as even with master. Do you still have more to do before you push your changes?
:big_sigh: I don't exactly understand what is going on. Looks like git command line vs git r studio gui are telling me different things.
@rplzzz Okay git issue resolved sort of, long story, but the branch krd_issue27 actually has the commits pushed to the remote branch.
Honestly, I find all the GUIs to be more confusing than helpful. With the command line tools I know what it is that I asked the system to do, and I can verify that it did it.
I'll take a look at the new data today or tomorrow, and if it looks like we have everything we need, then we can close this issue out.
Awesome!
Now we have emissions driven data from the following models
"CanESM2" "CESM1-BGC" "GFDL-ESM2M" "MRI-ESM1" "NorESM1-ME" "GFDL-ESM2G" "MIROC-ESM"
So instead of only 3 models we have a total of 7 models!
However, the package data seems to have only 5, and they're not a subset of the 7 you have above:
> filter(cmip_individual, experiment=='esmrcp85') -> esmrcp85
> unique(esmhist$model)
[1] "bcc-csm1-1-m" "CanESM2" "CESM1-BGC" "GFDL-ESM2G" "inmcm4"
This is with the latest of the krd_issue27
branch.
Possibly the rda files in data
were stale when you committed them. Can you double check and commit the up-to-date versions if necessary?
I re-ran the script that generates the comparison data and pushed the results.
> cmip_individual %>% filter(grepl('esm', experiment)) %>% pull(model) %>% unique()
[1] "CanESM2" "CESM1-BGC" "GFDL-ESM2M" "MRI-ESM1" "NorESM1-ME" "GFDL-ESM2G" "MIROC-ESM"
As we discussed on slack there were some issues with the comparison data among them being multiple entries for for a single model / year / ensemble member.
They were being caused for multiple reasons
Here are the models we have the emission driven data for
model
<chr>
1 CanESM2
2 CESM1-BGC
3 GFDL-ESM2G
4 GFDL-ESM2M
5 MIROC-ESM
6 MRI-ESM1
7 NorESM1-ME
Now when I check to see if there are any duplicates there are none! yay! (also I've added some assertthat checks to make sure that this doesn't happen again)
> cmip_individual %>% filter(grepl('esm', experiment)) %>% group_by(experiment, variable, model, year, ensemble) %>% summarise(year_count = n()) %>% ungroup %>% filter(year_count != 1)
# A tibble: 0 x 6
# … with 6 variables: experiment <chr>, variable <chr>, model <chr>, year <dbl>, ensemble <chr>,
# year_count <int>
This plot compares the min/max ESM range with the individual cmip model output. The thick dark black line is the multi-model mean from esm_comparison
.
The min CO2 range is from MRI-ESM1 which yes I know is low but the results have been published else where including Hartin et al. 2015.
When you get a chance to look at this please let me know what you think.
This plot compares the min/max ESM range with the individual cmip model output. The thick dark black line is the multi-model mean from
esm_comparison
.
This definitely looks better. One thing I noticed is there seems to be some kind of anomaly in the CO2 data at approximately 2030. It looks like CESM1-BGC has an unusually low value that year. Since it's the highest, that affects the max envelope, and the drop is large enough to produce a noticeable effect on the multi-model mean.
I'll take a closer look, and if it's only one bad value, I think we should insert a special case to just interpolate across the gap for that model. Then I should be able to update the other datasets listed above, and we can finally rerun the calcs for the paper.
The min CO2 range is from MRI-ESM1 which yes I know is low but the results have been published else where including Hartin et al. 2015.
Sounds reasonable to me. If the rest of the community is happy with it, then who are we to dispute them?
We need to make sure that all of the package data has been updated with the latest ESM data that we acquired. It looks like
cmip_individual
andesm_comparison
were both updated early in September, but I'm not sure whether that was before or after the last new data we got, so it wouldn't hurt to check that everything made its way in there. The other data sets appear not to have been updated recently.The data sets we need to check on are:
cmip_individual
esm_comparison
cmip_conc_pcproj
cmip_emiss_pcproj
conc_pc_comparison
emiss_pc_comparison