Calculation of potential pCO2

gmacgilchrist commented 3 years ago

A key part of the analysis will be to calculate potential pCO2 in the models. It is this quantity that is seen in observations to have a substantial mid-depth maximum that connects to the Southern Ocean surface.

Parameters in the carbonate system can be evaluated using the CO2SYS program, available as a python package here. See the installation and usage instructions there. You should be able to install the package from the terminal command line on Pangeo, in your notebook environment (you may have to periodically reinstall it if the environment gets refreshed - I'm not completely sure how this work on Pangeo).

For the calculation itself, you want to input DIC and Alkalinity and from that recover all of the carbonate system parameters. To get the potential pCO2, you need to get the pCO2 that the water would have, if it were raised to surface pressure. To do that you need to put pressure_in and temperature_in as the in situ pressure and temperature (i.e. the pressure and temperature where the "measurements" were made - of course we're not working with measurements, but model output, so we're talking about at the depth of the grid cell), and the pressure_out as zero and temperature_out as the potential temperature. Note that the model output will give you only the potential temperature, so you will have to first calculate the in situ pressure from that. You can do that using the TEOS-10 seawater calculations, for which there is again a python package. You should also be able to use this package to get the pressure from the depth (although this is a very small correction).

The variable that you want will then be in the output dictionary as pco2_out. You will see something of this calculation in my notebook that I showed you before, but that calculation was done rather quickly and I think contains some mistakes. Undoubtedly there will be some implementation issues but hopefully this gives you a good starting point.

abby-baskind commented 3 years ago

i wasn't able to find a function that calculates in situ temperature from potential temperature in the TEOS-10/gsw documentation. is there something i overlooked or is there a clever way around this?

gmacgilchrist commented 3 years ago

Use this

In the first instance, you can assume that the temperature given in the model is pretty close to CT, which is "conservative temperature". We may need to do this a little more carefully, but I will give it some thought.

abby-baskind commented 3 years ago

hi @gmacgilchrist, i'm working on my pCO2 code now and I'm coming across an error. (Also this is not an urgent issue at all so we can just talk about it when we meet next if you want)

Here's a block of my relevant code

variables = ['dissic','talk','so','thetao']
z_kwargs = {'consolidated': True, 'decode_times':False}
query = dict(experiment_id=['historical'], table_id=['Omon'], 
             variable_id=variables,
             grid_label=['gr'],
             source_id=['GFDL-ESM4',
                        'CESM2-FV2','CESM2','MRI-ESM2-0',
                        'CESM2-WACCM-FV2','GFDL-CM4','CESM2-WACCM',
                        'E3SM-1-1-ECA', 'E3SM-1-1','E3SM-1-0','NorCPM1'])
cat = col.search(**query)
# print(cat.df['source_id'].unique())
dset_dict = cat.to_dataset_dict(zarr_kwargs=z_kwargs, storage_options={'token': 'anon'},preprocess=combined_preprocessing)

And I'm getting this error

AggregationError: 
        Failed to join/concatenate datasets in group with key=CMIP.E3SM-Project.E3SM-1-0.historical.Omon.gr along a new dimension `member_id`.

        *** Arguments passed to xarray.concat() ***:

        - objs: a list of 5 datasets
        - dim: <xarray.DataArray 'member_id' (member_id: 5)>
array(['r1i1p1f1', 'r2i1p1f1', 'r4i1p1f1', 'r3i1p1f1', 'r5i1p1f1'],
      dtype='<U8')
Dimensions without coordinates: member_id
        - data_vars: ['so']
        - and kwargs: {'coords': 'minimal', 'compat': 'override'}

This error occurred for 'E3SM-1-1', 'E3SM-1-0', and 'NorCPM1' but none of the others. Luckily, the other 8 models worked fine in this part of the code, so I'm able to move on to working on the actual calculation part of the code. However, since I have so few models to work with, I really want the 3 error-inducing models to work. I think generally that the error is saying that models with multiple members are causing problems for aggregation. So, do you know more about what this error means and if/how I can fix it?

Also here's a link to the full code if you need more info/context https://github.com/abby-baskind/seniorthesis/blob/2727547868fa8bfc2ac7db22eb9a65124526d781/notebooks/pco2_calc.ipynb

gmacgilchrist commented 3 years ago

@abby-baskind Yes, I think there is some issue with separate members. I will take a look at the code today and let you know what I find.

gmacgilchrist commented 3 years ago

OK, I found an issue here addressing problems with selecting single members from model configurations, which appeared to be the route of your issues.

After making the query from the catalog, I processed the cat object to select only the "first" member_id using the following commands:

grouped = cat.df.groupby(cat.groupby_attrs)
cat.df = grouped.first().reset_index()

This groups the dataframe based on the attributes (i.e. source_id, experiment_id, etc.), then selects only the first entry for each attribute. This appears to have worked, at least to the stage of being able to combine all of the datasets. See a slightly modified version of your code here.

Let me know if it seems to work for you?

The syntax here is all based on the pandas package, which is what is working in the background organizing things for intake.