Open abby-baskind opened 3 years ago
I have no clue about CO2SYS, but the way I read it args
is supposed to be a dictionary (but turns out to be None
here). @gmacgilchrist can you help debug this?
Hi @abby-baskind OK, a few things...
Was this piece of code inherited from me: return ds['talk'].copy(data=results['pCO2_out'])
? I think it is not the best way to do this. I would suggest that your function to calculate PCO2 should output all of results
, and from there you can select the variable of interest (here pco2_out
). Unfortunately, pyco2.sys
outputs numpy arrays rather than xarray DataArrays (which is what we really want to work with). That's why we have to put them "back in" to DataArrays by copying an array that we already have and replacing the data with the numpy arrays (that's what ds['talk'].copy(data=results['pCO2_out'])
does). You probably then want to rename the variable in that DataArray, which will still be called talk
but of course is now pCO2_out
. You can do this with `.rename({'talk':'pCO2_out').
As for why you are getting this error, something is happening in the pCO2 calculation itself. Remember that your input data here still has a vertical dimension (you have only subselected for a single time point, meaning that the dimensions of ds
are x,y,lev
). That should not itself cause a problem (it doesn't look like this is the source of the error), but it is something that differs from your sections (which had only y
and lev
).
I would also recommend calculating pressure and in situ temperature outside of the pCO2 function, and include it as a variable in ds
. I don't think there's any reason to do these calcs within the function (possibly performance related things but that could be addressed further down the line), and it could be muddling some things around trying to do all that at once.
I would recommend you try to isolate the bug by simplifying the process and understanding the input and output of each line in that function. A common way to do this is to write some code that is not within a function, which allows you to pick out what's happening at each step (I find functions hard to debug because I can't see explicitly how each piece works). Once you have a working snippet of code, you can rebuild the function based on that to do the same thing for each of the models.
Let me know if you still don't get it working today, with fresh eyes, and I will have a look at running the code.
So I broke it down some time over the weekend, but I did it again today just to double check. As I kind of expected, the error is coming up with pyco2.sys(par1, par2, par1_type, par2_type, **kwargs)
function. I'm still very confused why this is happening. But I did write up a comparison of sorts between the version that works and the version that doesn't. Also, I'm not sure any of this makes sense or is helpful. It's very stream-of-consciousness.
So this--the one with the x-slice--is the one that works
ds_1 = dd_new_new['CESM2-FV2.gr.historical.Omon'].isel(time=1200).sel(x=slice(180,200)).mean('x',keep_attrs=True)
Output is xarray.Dataset
with dimensions (bnds: 2, lev: 33, vertex: 4, y: 180)
.
a. Notably, ds_1.thetao
, ds_1.dissic
, and ds_1.talk
have dimensions (lev: 33, y: 180)
. ds_1.so
has dimensions (y: 180, x: 360)
p_1 = gsw.p_from_z(-1*ds_1['lev'], ds_1['y'], geo_strf_dyn_height=0, sea_surface_geopotential=0)
Output is xarray.DataArray
with dimensions (lev: 33, y: 180)
.
insitutemp_1 = gsw.t_from_CT(ds_1['so'], ds_1['thetao'], p_1)
Output is xarray.DataArray
with dimensions (lev: 33, y: 180)
.
results_1 = pyco2.sys(par1=ds_1.talk*conversion,par2=ds_1.dissic*conversion,par1_type=1,par2_type=2, pressure_out=0, temperature_out = ds_1.thetao, pressure = p_1, temperature = insitutemp_1)
Output:
'pCO2_out': array([[ nan, nan, nan, ..., 309.44352178,
309.50128487, 309.3143347 ],
[ nan, nan, nan, ..., 309.44460043,
309.50219954, 309.31686714],
[ nan, nan, nan, ..., 309.44678642,
309.50517289, 309.31860388],
...,
[ nan, nan, nan, ..., nan,
nan, nan],
[ nan, nan, nan, ..., nan,
nan, nan],
[ nan, nan, nan, ..., nan,
nan, nan]]),
And this one doesn't work
ds_2 = dd_new_new['CESM2-FV2.gr.historical.Omon'].isel(time=1200)
Output is xarray.Dataset
with dimensions (bnds: 2, lev: 33, vertex: 4, x: 360, y: 180)
. So the only thing different here is the dimensions, but the other attributes are the same.
a. What might be kind of sus is that ds_2.talk
and ds_2.dissic
have dimensions (lev: 33, y: 180, x: 360)
but ds_2.thetao
and ds_2.so
have dimensions (y: 180, x: 360)
p_2 = gsw.p_from_z(-1*ds_2['lev'], ds_2['y'], geo_strf_dyn_height=0, sea_surface_geopotential=0)
Output is xarray.DataArray
with dimensions (lev: 33, y: 180)
and is nearly identical to the other example.
insitutemp_2 = gsw.t_from_CT(ds_2['so'], ds_2['thetao'], p_2)
Output is xarray.DataArray
with dimensions (lev: 33, y: 180, x: 360)
, so the dimensions are different, but it still is the same type and has the same attributes as the previous example.
results_2 = pyco2.sys(par1=ds_2.talk*conversion,par2=ds_2.dissic*conversion,par1_type=1,par2_type=2, pressure_out=0, temperature_out = ds_2.thetao, pressure = p_2, temperature = insitutemp_2)
Output:
PyCO2SYS error: input shapes cannot be broadcast together.
AttributeError Traceback (most recent call last)
Great stuff!
Broadcasting is when you are trying to perform an operation (e.g. adding) on two datasets that have different dimension or coordinate information. Xarray "broadcasts" the dimensions of one array to match those of the other. For example, imagine one array that has x and y dimensions, and another that has x, y and time. When you add the two arrays, xarray will expand (replicate) the first array along the time dimension, so that the two arrays can be correctly added together. This is called "broadcasting". When there is an error like what you had, it usually implies that there is some error in trying to match up the dimensions of the two arrays.
It looks like your error might be coming from the different dimensions of the hydrographic and biogeochemical variables. It looks like thetao
and so
have lost their lev
dimension somewhere - perhaps earlier in the code you selected a specific depth level for these variables?
I will take a look at the code tomorrow morning. Can you point me to the latest notebook where this bug appears?
@gmacgilchrist here's the notebook i've been working with but it's a bit chaotic so i might start fresh. if there is a new and improved code i'll let you know
@gmacgilchrist actually here's an updated one https://github.com/abby-baskind/seniorthesis/blob/56889ebe5f3311d29b9a104c4e5efb0593373957/notebooks/pco2_debug.ipynb
OK so I took a little look at this. I didn't fully work out the source of your error, but I got something working that might be a good starting point. I think the trick is that I selected the depth level before calculating the potential pCO2. This code snippet works.
Note that, here, I am setting pressure to zero everywhere because I am just doing this for the very surface (likewise in this context calculating the in situ temperature here is probably a useless step, I could just set it to be exactly the potential temperature). If you were to do this for a deeper level, you would have to calculate the pressure properly.
Found the error!
This function : p_2 = gsw.p_from_z(-1*ds_2['lev'], ds_2['y'], geo_strf_dyn_height=0, sea_surface_geopotential=0)
is creating an array that has only the lev
and y
dimensions. You need to expand/replicate it along the x
dimension, before feeding it into the pyco2sys function. That's where the "broadcasting" error was coming from.
There are a few ways to do this, but I think the simplest is to multiply the function above by an xarray DataArray object that has only the x
dimension and is filled with ones. You can create such an array using xr.ones_like(ds_2['x'])
. That will allow you to pass all these 3D arrays to pyCO2sys.
However, I think that the solution that I posted above is actually preferable. The reason is that the pco2 calculation is rather computationally expensive, so using it on a large array can be very slow. So, better to subset the array first, i.e. select a specific depth level, before doing the pyco2sys calculation.
Make sense?
So the PpCO2 calculation and visualization more or less works when I select a time and an x slice. For example,
ds = dd_new_new['CESM2-FV2.gr.historical.Omon'].isel(time=1200).sel(x=slice(180,200)).mean('x',keep_attrs=True)
And eventually the plot looks like this, which is more or less reasonable. The NorESM2-LM output is wacky, but that's just an issue with the gridding that I'm avoiding for now.
But for the polar stereographic projection, I need to get rid of the x slice, so I chose
ds = dd_new_new['CESM2-FV2.gr.historical.Omon'].isel(time=1200)
. But an error comes up that I don't understand and don't know how to deal with.First, here's the relevant code.
And this error comes up
So I tried to figure out what the attributes of
ds['talk']
was (since it's the first variable called in the function), and indeed it didn't have a 'keys' attribute.So I guess I'm wondering why this happens (but only when I don't select an x slice) and what the work around is