Open abby-baskind opened 3 years ago
Hi @abby-baskind, you should be able to see if these are different at all by listing the keys of the dataset dictionary.
Can you paste the output of
print(list([k for k in dd_thetao.keys() if 'CanESM' in k]))
These keys are a combination of the unique attributes of each dataset (I just filtered for the CanESM models, but you can remove the if ...
statement and see the keys for all models.
My suspicion: These are different members of the same model. Many of the CMIP6 models are run as an ensemble, which means that several experiments are branched off the original piControl
run at different times, but then are run with the same forcing (here the historical
forcing). This enables us to get an idea of how much of a given signal is internal variability (differences between members) vs. forced variability (the common signal between members).
If it turns out that my suspicion is right, you might for now just want to work with a single member per model (and add complexity later). You can find an example of how to do that with cmip6_preprocessing here.
hey @jbusecke i'm getting an error importing postprocessing
---------------------------------------------------------------------------
ModuleNotFoundError Traceback (most recent call last)
<ipython-input-8-3a65b39712db> in <module>
8 import gsw
9 warnings.filterwarnings("ignore")
---> 10 from cmip6_preprocessing.postprocessing import combine_datasets
ModuleNotFoundError: No module named 'cmip6_preprocessing.postprocessing'
not sure what to do about this since cmip6_preprocessing
is already installed...
This means that the version of cmip6_preprocessing is old (you can confirm that by doing:
import cmip6_preprocessing
print(cmip6_preprocessing.__version__)
Updating dependencies can be quite a pain on the pangeo cloud, but I think this one here should be the easiest:
You can run this line in a terminal:
mamba update cmip6_preprocessing -y
Or run this from a notebook cell:
! mamba update cmip6_preprocessing -y
The
!
enables you to execute shell commands from the notebook.
But here is the catch: You will have to do this at the beginning of every session, because each time you log out the environment in the cloud is restored. Once you have done this you also might need to restart the notebook itself for changes to appear.
To remember this, you could put something like this in an early cell of your notebook:
import cmip6_preprocessing
if cmip6_preprocessing.__version__ < '0.4.0':
print('Manually update cmip6_pp with `mamba update cmip6_preprocessing -y`, and restart the notebook')
thanks @jbusecke you are truly the best
Oh you might also want to plot the member_id (ds.attrs['variant_label']
...I know why are there two words...its hella annoying) in the title so you can see which member is printed.
Sorry, that was too much coffee and a lagging mouse.
@jbusecke Does this mean that the native install of cmip6_preprocessing
on Pangeo has fallen behind? Can we get that updated easily?
I agree that the apparent multiple models could well be ensemble members of the same configuration. As Julius suggests, take a closer look at the specific details of the datasets to see how they differ. For example the full name associated with each of these models iterations should reveal the difference somewhere.
so i used the strategy julius recommended to pick the first member of each model and printed out the source_id
and variant_label
for the selected models (this time it's so
andthetao
) and the output raised my hopes so high...
# output for so
CanESM5r3i1p1f1
CNRM-ESM2-1r3i1p1f2
MPI-ESM1-2-LRr9i1p1f1
GISS-E2-1-Gr6i1p1f1
IPSL-CM6A-LRr28i1p1f1
MIROC-ES2Lr7i1p1f2
MPI-ESM1-2-HRr3i1p1f1
ACCESS-ESM1-5r5i1p1f1
UKESM1-0-LLr6i1p1f3
CanESM5-CanOEr3i1p2f1
CESM2r2i1p1f1
MPI-ESM-1-2-HAMr2i1p1f1
CESM2-WACCM-FV2r1i1p1f1
CESM2-FV2r1i1p1f1
CESM2-WACCMr2i1p1f1
GISS-E2-1-G-CCr1i1p1f1
so with so much optimism, I plotted. and my hopes were destroyed.
as you can probably tell, these plots are so sus, mostly notably because some of them include latitudes that don't exist. so i restricted my x axis to latitudes that do exist, hoping that values in those real latitudes would kind of look normal but nope
so i'm not really sure what's going on now :/
What did you change in the plotting procedure? The y-range should not have changed like this just from selecting a different subset of models...
Take a closer look at the data arrays themselves, and the corresponding coordinates. Where and how are they changing to become this wider range, i.e. at what point in your code is this error coming in?
@gmacgilchrist i wasn't able to figure out when the error happened but somewhere along the way the y coordinates changed from latitude to (i think) indices. i started the whole thing over from scratch, avoided that error, and made some more reasonable plots. the only issue i'm having now is that a couple models are still goofy. CESM2-FV2 was funky for all the plots. For example, the right-most subplot on the bottom row...
i looked in the data array to see what y was, and it was wacky...
y (y) float64 -70.67 -70.14 -69.6 ... 4.767e+36 4.767e+36
none of the other models had this issue so that's weird
also, the GISS models gave some weird values for DIC
I printed out some info on one of the GISS models (you can see it here https://github.com/abby-baskind/seniorthesis/blob/089b9e650c5a5864a2bdabb4220165cca2436345/notebooks/better_fig_slices.ipynb at In[38]
) and nothing stood out as weird. is there a better way to get more info on this model to try to figure out what's wrong?
@abby-baskind For the GISS models, try setting the colorbar limits to something reasonable. Sometimes, singular huge values can throw the limits off making the plot look crazy. That being said, the values here do look huge. I suspect that these data have been submitted incorrectly. Take a bit of a closer look, but if it really looks like the numbers are crazy, we can omit these models for now and follow up on getting the correct data.
Yes, the latitude variable for CESM looks strange indeed. Can you look at the latitude more closely - is it just a few values that are off, or something more foundational? It seems like you might be inadvertently making some changes to the coordinates of the y dimension (this would likely be the reason why they appeared to be lost in the plot you did before). I can't see where that might be in your code. It could also be an issue with the CESM output - @jbusecke might be able to shed some light on this.
General advice when there seems to be an issue is to strip back the complexity of your code to look in a bit more detail at the individual models and the variables therein. Perhaps in regards to the GISS model for example, can you make a simple plot of just the surface DIC? If the numbers are still all over the place, that's a clear indication that the data has not been stored correctly.
Taking a bit of a closer look, I wonder if some of these issues (particularly as it concerns the latitude coordinate) may be arising in the combine_datasets
function. I'm not familiar with that functionality, so perhaps @jbusecke can shed some light on whether it could be muddling up the coordinates for some models?
@abby-baskind You could take a look at this by looking just at the CESM model for which the issues appear, but before you do any combining of datasets. Can you just load that dataset and take a look - are the issues there from the start?
so for the CESM model, i plotted all the members (not using combine_datasets
) and still got some funky coordinates, so it seems combine_datasets
isn't the issue.
I peeked into the array to figure out where the coordinates went awry. Here's y[90:100]
: [7.596783e+01, 7.623199e+01, 7.644568e+01, 7.662436e+01, 7.677391e+01, 7.689791e+01, 4.984605e+35, 4.984605e+35, 4.984605e+35, 4.984605e+35]
(so it goes wrong at index 96). I restricted the x range to [-90, 90], and at least until 20°N it looked reasonable. I'm currently not sure what's happening north of 20°N.
I'm still trying to figure out what's happening with the GISS models. I trying to figure out how to plot surface DIC like you suggested (so far, no luck), but in the meantime, I plotted all the GISS model members like I did with CESM (no combine_datasets
) and resticted the colorbar to more reasonable values (between 0 and 5 mol/m^3 specifically). It still looks sus (which i suspected would happen since if you look at the plot I posted earlier, you can actually see that a lot of the DIC values are negative). honestly, I'm losing a lot of faith in the GISS models and they might just need to be scrapped for now
What might be worth trying is plotting these models forgrid_label = gr
and/or table_id = Oyr
instead of gn
and Omon
. Thoughts on that?
Great investigative work, @abby-baskind ! 🕵🏼
Definitely looks like the coordinates are muddled with CESM. Perhaps this is something that can be looked at in cmip6_preprocessing
@jbusecke ? This could well be solved by taking the gr
grid for this model.
Also agree that GISS doesn't look great - best guess is that they've posted the wrong data (this happens sometimes). Suggest you scrap it for now. Surface data can be plotted using .isel(lev=0)
. Doing it in a proper map projection, using cartopy
is not necessary for the purposes of exploration/debugging.
Definitely worth checking to see if gr
or yearly data for these models looks any better 👍🏼 Otherwise just press ahead for now with the models for which the data looks decent.
Sorry just got to this. I second @gmacgilchrist, that we need to look at these models individually.
1) How are you slicing/selecting a longitudinal section?
2) I think most of your issues are related to not having a proper latitude coordinate. The lat
values are 2d and if you do any kind of reduction, you might loose that. y
can be an index or a nominal
lat coordinate. Both of these are not at all ideal in the Arctic region.
3) What could also really screw with the data is if the latitude values have nans in them. A simple way to check is to plot the data without coordinates plt.imshow(ds.thetao.data)
and see if that still looks wonky, if not, your problem is definitely with the coordinates.
I would suggest:
Short term investigation: Plot a map of the surface data, and check if the models that you are looking at are particularly wonky
(e.g. distorted) if you plot them just against x/y).
Longer term solution: I think the cleanest way to do this is to horizontally regrid all gn
models onto regular lon/lat grids. That way you can be sure that a slice of your latitude is actually representing the appropriate position on the globe. I am actually working on implementing that in cmip6_pp, so if you can wait a bit, this might solve itself.
Just to add to the above:
Definitely looks like the coordinates are muddled with CESM. Perhaps this is something that can be looked at in cmip6_preprocessing @jbusecke ? This could well be solved by taking the gr grid for this model.
CESM Definitely has nans in the lon/lat fields!
@gmacgilchrist so i tried to throw together some new figs for
dissic
,talk
,thetao
, andso
for models from the subset of['IPSL-CM6A-LR', 'CNRM-ESM2-1', 'CESM2', 'CanESM5', 'CanESM5-CanOE', 'MPI-ESM-1-2-HAM', 'UKESM1-0-LL', 'MPI-ESM1-2-LR', 'MPI-ESM1-2-HR', 'CESM2-WACCM', 'GISS-E2-1-G', 'GISS-E2-1-G-CC', 'MIROC-ES2L', 'ACCESS-ESM1-5','CESM2-WACCM-FV2', 'CESM2-FV2']
wheregrid_label = 'gn', experiment_id = 'historical', and table_id = 'Omon'
. I was thinking the output would have 16 subfigures (one for each model), but that didn't happen. for example...in the plots of thetao (not unique to thetao but a good example), there are 4 CanESM5. presumably, there is a slight difference between these models (maybe different runs of the same model/a different attribute that i'm not seeing). this is a terribly non-specific question but why? and how do i deal with it?
here's a block of code that might be relevant but here's the link to the whole code https://github.com/abby-baskind/seniorthesis/blob/9b70164a7e6bc418fb56f15f8cf09c233e282c3a/notebooks/figs_slices.ipynb