abby-baskind / seniorthesis

0 stars 2 forks source link

problems with polar stereographic project of PpCO2 #12

Open abby-baskind opened 3 years ago

abby-baskind commented 3 years ago

So the PpCO2 calculation and visualization more or less works when I select a time and an x slice. For example,

ds = dd_new_new['CESM2-FV2.gr.historical.Omon'].isel(time=1200).sel(x=slice(180,200)).mean('x',keep_attrs=True)

And eventually the plot looks like this, which is more or less reasonable. The NorESM2-LM output is wacky, but that's just an issue with the gridding that I'm avoiding for now.

Screen Shot 2021-07-17 at 10 41 13 AM

But for the polar stereographic projection, I need to get rid of the x slice, so I chose ds = dd_new_new['CESM2-FV2.gr.historical.Omon'].isel(time=1200). But an error comes up that I don't understand and don't know how to deal with.

First, here's the relevant code.

def calc_PpCO2(ds):
    p = gsw.p_from_z(-1*ds['lev'], ds['y'], geo_strf_dyn_height=0, sea_surface_geopotential=0)
    insitutemp = gsw.t_from_CT(ds['so'], ds['thetao'], p)  # 0 is just a filler rn 
    conversion =  1e6/1035
    results = pyco2.sys(par1=ds['talk']*conversion,par2=ds['dissic']*conversion,par1_type=1,par2_type=2,
                        pressure_out=0, temperature_out = ds['thetao'], pressure = p, 
                        temperature = insitutemp)
    return ds['talk'].copy(data=results['pCO2_out'])

ds = dd_new_new['CESM2-FV2.gr.historical.Omon'].isel(time=1200)
out = calc_PpCO2(ds)

And this error comes up

---------------------------------------------------------------------------
AttributeError                            Traceback (most recent call last)
<ipython-input-36-fd87b1424789> in <module>
      1 ds = dd_new_new['CESM2-FV2.gr.historical.Omon'].isel(time=1200)
----> 2 out = calc_PpCO2(ds)
      3 out

<ipython-input-8-e415eef5446b> in calc_PpCO2(ds)
      8     insitutemp = gsw.t_from_CT(ds['so'], ds['thetao'], p)  # 0 is just a filler rn
      9     conversion =  1e6/1035
---> 10     results = pyco2.sys(par1=ds['talk']*conversion,par2=ds['dissic']*conversion,par1_type=1,par2_type=2,
     11                         pressure_out=0, temperature_out = ds['thetao'], pressure = p,
     12                         temperature = insitutemp)

/srv/conda/envs/notebook/lib/python3.8/site-packages/PyCO2SYS/engine/nd.py in CO2SYS(par1, par2, par1_type, par2_type, salinity, temperature, pressure, temperature_out, pressure_out, total_ammonia, total_phosphate, total_silicate, total_sulfide, total_borate, total_calcium, total_fluoride, total_sulfate, opt_gas_constant, opt_k_bisulfate, opt_k_carbonic, opt_k_fluoride, opt_pH_scale, opt_total_borate, buffers_mode, k_ammonia, k_ammonia_out, k_borate, k_borate_out, k_bisulfate, k_bisulfate_out, k_CO2, k_CO2_out, k_carbonic_1, k_carbonic_1_out, k_carbonic_2, k_carbonic_2_out, k_fluoride, k_fluoride_out, k_phosphoric_1, k_phosphoric_1_out, k_phosphoric_2, k_phosphoric_2_out, k_phosphoric_3, k_phosphoric_3_out, k_silicate, k_silicate_out, k_sulfide, k_sulfide_out, k_water, k_water_out, k_calcite, k_calcite_out, k_aragonite, k_aragonite_out, fugacity_factor, fugacity_factor_out, gas_constant, gas_constant_out, total_alpha, k_alpha, k_alpha_out, total_beta, k_beta, k_beta_out, vp_factor, vp_factor_out, grads_of, grads_wrt, uncertainty_into, uncertainty_from)
    540         "total_beta": "total_beta",
    541     }
--> 542     if np.any(np.isin(list(args.keys()), list(totals_optional.keys()))):
    543         totals = {
    544             totals_optional[k]: v * 1e-6

AttributeError: 'NoneType' object has no attribute 'keys'

So I tried to figure out what the attributes of ds['talk'] was (since it's the first variable called in the function), and indeed it didn't have a 'keys' attribute.

ds = dd_new_new['CESM2-FV2.gr.historical.Omon'].isel(time=1200)
ds['talk'].attrs

{'cell_measures': 'area: areacello volume: volcello',
 'cell_methods': 'area: mean where sea time: mean',
 'comment': 'Model data on the 1x1 grid includes values in all cells for which ocean cells on the native grid cover more than 52.5 percent of the 1x1 grid cell. This 52.5 percent cutoff was chosen to produce ocean surface area on the 1x1 grid as close as possible to ocean surface area on the native grid, while not introducing fractional cell coverage.',
 'description': 'total alkalinity equivalent concentration (including carbonate, borate, phosphorus, silicon, and nitrogen components)',
 'frequency': 'mon',
 'id': 'talk',
 'long_name': 'Total Alkalinity',
 'mipTable': 'Omon',
 'out_name': 'talk',
 'prov': 'Omon ((isd.003))',
 'realm': 'ocnBgchem',
 'standard_name': 'sea_water_alkalinity_expressed_as_mole_equivalent',
 'time': 'time',
 'time_label': 'time-mean',
 'time_title': 'Temporal mean',
 'title': 'Total Alkalinity',
 'type': 'real',
 'units': 'mol m-3',
 'variable_id': 'talk'}

So I guess I'm wondering why this happens (but only when I don't select an x slice) and what the work around is

jbusecke commented 3 years ago

I have no clue about CO2SYS, but the way I read it args is supposed to be a dictionary (but turns out to be None here). @gmacgilchrist can you help debug this?

gmacgilchrist commented 3 years ago

Hi @abby-baskind OK, a few things...

Was this piece of code inherited from me: return ds['talk'].copy(data=results['pCO2_out'])? I think it is not the best way to do this. I would suggest that your function to calculate PCO2 should output all of results, and from there you can select the variable of interest (here pco2_out). Unfortunately, pyco2.sys outputs numpy arrays rather than xarray DataArrays (which is what we really want to work with). That's why we have to put them "back in" to DataArrays by copying an array that we already have and replacing the data with the numpy arrays (that's what ds['talk'].copy(data=results['pCO2_out']) does). You probably then want to rename the variable in that DataArray, which will still be called talk but of course is now pCO2_out. You can do this with `.rename({'talk':'pCO2_out').

As for why you are getting this error, something is happening in the pCO2 calculation itself. Remember that your input data here still has a vertical dimension (you have only subselected for a single time point, meaning that the dimensions of ds are x,y,lev). That should not itself cause a problem (it doesn't look like this is the source of the error), but it is something that differs from your sections (which had only y and lev).

I would also recommend calculating pressure and in situ temperature outside of the pCO2 function, and include it as a variable in ds. I don't think there's any reason to do these calcs within the function (possibly performance related things but that could be addressed further down the line), and it could be muddling some things around trying to do all that at once.

I would recommend you try to isolate the bug by simplifying the process and understanding the input and output of each line in that function. A common way to do this is to write some code that is not within a function, which allows you to pick out what's happening at each step (I find functions hard to debug because I can't see explicitly how each piece works). Once you have a working snippet of code, you can rebuild the function based on that to do the same thing for each of the models.

Let me know if you still don't get it working today, with fresh eyes, and I will have a look at running the code.

abby-baskind commented 3 years ago

So I broke it down some time over the weekend, but I did it again today just to double check. As I kind of expected, the error is coming up with pyco2.sys(par1, par2, par1_type, par2_type, **kwargs) function. I'm still very confused why this is happening. But I did write up a comparison of sorts between the version that works and the version that doesn't. Also, I'm not sure any of this makes sense or is helpful. It's very stream-of-consciousness.

So this--the one with the x-slice--is the one that works

  1. ds_1 = dd_new_new['CESM2-FV2.gr.historical.Omon'].isel(time=1200).sel(x=slice(180,200)).mean('x',keep_attrs=True) Output is xarray.Dataset with dimensions (bnds: 2, lev: 33, vertex: 4, y: 180). a. Notably, ds_1.thetao, ds_1.dissic, and ds_1.talk have dimensions (lev: 33, y: 180). ds_1.so has dimensions (y: 180, x: 360)

  2. p_1 = gsw.p_from_z(-1*ds_1['lev'], ds_1['y'], geo_strf_dyn_height=0, sea_surface_geopotential=0) Output is xarray.DataArray with dimensions (lev: 33, y: 180).

  3. insitutemp_1 = gsw.t_from_CT(ds_1['so'], ds_1['thetao'], p_1) Output is xarray.DataArray with dimensions (lev: 33, y: 180).

  4. results_1 = pyco2.sys(par1=ds_1.talk*conversion,par2=ds_1.dissic*conversion,par1_type=1,par2_type=2, pressure_out=0, temperature_out = ds_1.thetao, pressure = p_1, temperature = insitutemp_1) Output:

    'pCO2_out': array([[         nan,          nan,          nan, ..., 309.44352178,
         309.50128487, 309.3143347 ],
        [         nan,          nan,          nan, ..., 309.44460043,
         309.50219954, 309.31686714],
        [         nan,          nan,          nan, ..., 309.44678642,
         309.50517289, 309.31860388],
        ...,
        [         nan,          nan,          nan, ...,          nan,
                  nan,          nan],
        [         nan,          nan,          nan, ...,          nan,
                  nan,          nan],
        [         nan,          nan,          nan, ...,          nan,
                  nan,          nan]]),

And this one doesn't work

  1. ds_2 = dd_new_new['CESM2-FV2.gr.historical.Omon'].isel(time=1200) Output is xarray.Dataset with dimensions (bnds: 2, lev: 33, vertex: 4, x: 360, y: 180). So the only thing different here is the dimensions, but the other attributes are the same. a. What might be kind of sus is that ds_2.talk and ds_2.dissic have dimensions (lev: 33, y: 180, x: 360) but ds_2.thetao and ds_2.so have dimensions (y: 180, x: 360)

  2. p_2 = gsw.p_from_z(-1*ds_2['lev'], ds_2['y'], geo_strf_dyn_height=0, sea_surface_geopotential=0) Output is xarray.DataArray with dimensions (lev: 33, y: 180) and is nearly identical to the other example.

  3. insitutemp_2 = gsw.t_from_CT(ds_2['so'], ds_2['thetao'], p_2) Output is xarray.DataArray with dimensions (lev: 33, y: 180, x: 360), so the dimensions are different, but it still is the same type and has the same attributes as the previous example.

  4. results_2 = pyco2.sys(par1=ds_2.talk*conversion,par2=ds_2.dissic*conversion,par1_type=1,par2_type=2, pressure_out=0, temperature_out = ds_2.thetao, pressure = p_2, temperature = insitutemp_2) Output:

    
    PyCO2SYS error: input shapes cannot be broadcast together.

AttributeError Traceback (most recent call last)

in 1 conversion = 1e6/1035 ----> 2 results_2 = pyco2.sys(par1=ds_2.talk*conversion,par2=ds_2.dissic*conversion,par1_type=1,par2_type=2, 3 pressure_out=0, temperature_out = ds_2.thetao, pressure = p_2, 4 temperature = insitutemp_2) 5 results_2 /srv/conda/envs/notebook/lib/python3.8/site-packages/PyCO2SYS/engine/nd.py in CO2SYS(par1, par2, par1_type, par2_type, salinity, temperature, pressure, temperature_out, pressure_out, total_ammonia, total_phosphate, total_silicate, total_sulfide, total_borate, total_calcium, total_fluoride, total_sulfate, opt_gas_constant, opt_k_bisulfate, opt_k_carbonic, opt_k_fluoride, opt_pH_scale, opt_total_borate, buffers_mode, k_ammonia, k_ammonia_out, k_borate, k_borate_out, k_bisulfate, k_bisulfate_out, k_CO2, k_CO2_out, k_carbonic_1, k_carbonic_1_out, k_carbonic_2, k_carbonic_2_out, k_fluoride, k_fluoride_out, k_phosphoric_1, k_phosphoric_1_out, k_phosphoric_2, k_phosphoric_2_out, k_phosphoric_3, k_phosphoric_3_out, k_silicate, k_silicate_out, k_sulfide, k_sulfide_out, k_water, k_water_out, k_calcite, k_calcite_out, k_aragonite, k_aragonite_out, fugacity_factor, fugacity_factor_out, gas_constant, gas_constant_out, total_alpha, k_alpha, k_alpha_out, total_beta, k_beta, k_beta_out, vp_factor, vp_factor_out, grads_of, grads_wrt, uncertainty_into, uncertainty_from) 540 "total_beta": "total_beta", 541 } --> 542 if np.any(np.isin(list(args.keys()), list(totals_optional.keys()))): 543 totals = { 544 totals_optional[k]: v * 1e-6 AttributeError: 'NoneType' object has no attribute 'keys' ``` Here, I don't know which object is supposedly `NoneType`, and I don't know what "input shapes cannot be broadcast together" means. The only thing that stands out as potentially being problematic is 1a in each example, but I don't think I fully understand what's going on there. But if you want to look at the entire output, it's all [here](https://github.com/abby-baskind/seniorthesis/blob/90e3c7657e9f6e3f85c24df6d0d5c40c2d2552fb/notebooks/ugh.ipynb).
gmacgilchrist commented 3 years ago

Great stuff!

Broadcasting is when you are trying to perform an operation (e.g. adding) on two datasets that have different dimension or coordinate information. Xarray "broadcasts" the dimensions of one array to match those of the other. For example, imagine one array that has x and y dimensions, and another that has x, y and time. When you add the two arrays, xarray will expand (replicate) the first array along the time dimension, so that the two arrays can be correctly added together. This is called "broadcasting". When there is an error like what you had, it usually implies that there is some error in trying to match up the dimensions of the two arrays.

It looks like your error might be coming from the different dimensions of the hydrographic and biogeochemical variables. It looks like thetao and so have lost their lev dimension somewhere - perhaps earlier in the code you selected a specific depth level for these variables?

I will take a look at the code tomorrow morning. Can you point me to the latest notebook where this bug appears?

abby-baskind commented 3 years ago

@gmacgilchrist here's the notebook i've been working with but it's a bit chaotic so i might start fresh. if there is a new and improved code i'll let you know

abby-baskind commented 3 years ago

@gmacgilchrist actually here's an updated one https://github.com/abby-baskind/seniorthesis/blob/56889ebe5f3311d29b9a104c4e5efb0593373957/notebooks/pco2_debug.ipynb

gmacgilchrist commented 3 years ago

OK so I took a little look at this. I didn't fully work out the source of your error, but I got something working that might be a good starting point. I think the trick is that I selected the depth level before calculating the potential pCO2. This code snippet works.

Screen Shot 2021-07-20 at 5 51 00 PM

Note that, here, I am setting pressure to zero everywhere because I am just doing this for the very surface (likewise in this context calculating the in situ temperature here is probably a useless step, I could just set it to be exactly the potential temperature). If you were to do this for a deeper level, you would have to calculate the pressure properly.

gmacgilchrist commented 3 years ago

Found the error!

This function : p_2 = gsw.p_from_z(-1*ds_2['lev'], ds_2['y'], geo_strf_dyn_height=0, sea_surface_geopotential=0) is creating an array that has only the lev and y dimensions. You need to expand/replicate it along the x dimension, before feeding it into the pyco2sys function. That's where the "broadcasting" error was coming from.

There are a few ways to do this, but I think the simplest is to multiply the function above by an xarray DataArray object that has only the x dimension and is filled with ones. You can create such an array using xr.ones_like(ds_2['x']). That will allow you to pass all these 3D arrays to pyCO2sys.

However, I think that the solution that I posted above is actually preferable. The reason is that the pco2 calculation is rather computationally expensive, so using it on a large array can be very slow. So, better to subset the array first, i.e. select a specific depth level, before doing the pyco2sys calculation.

Make sense?