abby-baskind / seniorthesis

0 stars 2 forks source link

Calculation of potential pCO2 #4

Open gmacgilchrist opened 3 years ago

gmacgilchrist commented 3 years ago

A key part of the analysis will be to calculate potential pCO2 in the models. It is this quantity that is seen in observations to have a substantial mid-depth maximum that connects to the Southern Ocean surface.

Parameters in the carbonate system can be evaluated using the CO2SYS program, available as a python package here. See the installation and usage instructions there. You should be able to install the package from the terminal command line on Pangeo, in your notebook environment (you may have to periodically reinstall it if the environment gets refreshed - I'm not completely sure how this work on Pangeo).

For the calculation itself, you want to input DIC and Alkalinity and from that recover all of the carbonate system parameters. To get the potential pCO2, you need to get the pCO2 that the water would have, if it were raised to surface pressure. To do that you need to put pressure_in and temperature_in as the in situ pressure and temperature (i.e. the pressure and temperature where the "measurements" were made - of course we're not working with measurements, but model output, so we're talking about at the depth of the grid cell), and the pressure_out as zero and temperature_out as the potential temperature. Note that the model output will give you only the potential temperature, so you will have to first calculate the in situ pressure from that. You can do that using the TEOS-10 seawater calculations, for which there is again a python package. You should also be able to use this package to get the pressure from the depth (although this is a very small correction).

The variable that you want will then be in the output dictionary as pco2_out. You will see something of this calculation in my notebook that I showed you before, but that calculation was done rather quickly and I think contains some mistakes. Undoubtedly there will be some implementation issues but hopefully this gives you a good starting point.

abby-baskind commented 3 years ago

i wasn't able to find a function that calculates in situ temperature from potential temperature in the TEOS-10/gsw documentation. is there something i overlooked or is there a clever way around this?

gmacgilchrist commented 3 years ago

Use this

In the first instance, you can assume that the temperature given in the model is pretty close to CT, which is "conservative temperature". We may need to do this a little more carefully, but I will give it some thought.

abby-baskind commented 3 years ago

hi @gmacgilchrist, i'm working on my pCO2 code now and I'm coming across an error. (Also this is not an urgent issue at all so we can just talk about it when we meet next if you want)

Here's a block of my relevant code

variables = ['dissic','talk','so','thetao']
z_kwargs = {'consolidated': True, 'decode_times':False}
query = dict(experiment_id=['historical'], table_id=['Omon'], 
             variable_id=variables,
             grid_label=['gr'],
             source_id=['GFDL-ESM4',
                        'CESM2-FV2','CESM2','MRI-ESM2-0',
                        'CESM2-WACCM-FV2','GFDL-CM4','CESM2-WACCM',
                        'E3SM-1-1-ECA', 'E3SM-1-1','E3SM-1-0','NorCPM1'])
cat = col.search(**query)
# print(cat.df['source_id'].unique())
dset_dict = cat.to_dataset_dict(zarr_kwargs=z_kwargs, storage_options={'token': 'anon'},preprocess=combined_preprocessing)

And I'm getting this error

AggregationError: 
        Failed to join/concatenate datasets in group with key=CMIP.E3SM-Project.E3SM-1-0.historical.Omon.gr along a new dimension `member_id`.

        *** Arguments passed to xarray.concat() ***:

        - objs: a list of 5 datasets
        - dim: <xarray.DataArray 'member_id' (member_id: 5)>
array(['r1i1p1f1', 'r2i1p1f1', 'r4i1p1f1', 'r3i1p1f1', 'r5i1p1f1'],
      dtype='<U8')
Dimensions without coordinates: member_id
        - data_vars: ['so']
        - and kwargs: {'coords': 'minimal', 'compat': 'override'}

This error occurred for 'E3SM-1-1', 'E3SM-1-0', and 'NorCPM1' but none of the others. Luckily, the other 8 models worked fine in this part of the code, so I'm able to move on to working on the actual calculation part of the code. However, since I have so few models to work with, I really want the 3 error-inducing models to work. I think generally that the error is saying that models with multiple members are causing problems for aggregation. So, do you know more about what this error means and if/how I can fix it?

Also here's a link to the full code if you need more info/context https://github.com/abby-baskind/seniorthesis/blob/2727547868fa8bfc2ac7db22eb9a65124526d781/notebooks/pco2_calc.ipynb

gmacgilchrist commented 3 years ago

@abby-baskind Yes, I think there is some issue with separate members. I will take a look at the code today and let you know what I find.

gmacgilchrist commented 3 years ago

OK, I found an issue here addressing problems with selecting single members from model configurations, which appeared to be the route of your issues.

After making the query from the catalog, I processed the cat object to select only the "first" member_id using the following commands:

grouped = cat.df.groupby(cat.groupby_attrs)
cat.df = grouped.first().reset_index()

This groups the dataframe based on the attributes (i.e. source_id, experiment_id, etc.), then selects only the first entry for each attribute. This appears to have worked, at least to the stage of being able to combine all of the datasets. See a slightly modified version of your code here.

Let me know if it seems to work for you?

The syntax here is all based on the pandas package, which is what is working in the background organizing things for intake.

jbusecke commented 3 years ago

First of all, it seems that you have discovered a bug in either intake-esm or cmip6_preprocessing 🎉. I will investigate further what is going on there.

An alternative option to @gmacgilchrist excellent answer is to let intake-esm not aggregate (cat.to_dataset_dict(..., aggregate=False)) over the members in the first place (which seems to give you problems) and then select a single member afterwards...just saying this cause cmip6_preprocessing can do that for you already 😁: https://cmip6-preprocessing.readthedocs.io/en/latest/postprocessing.html#Custom-combination-functions

This might give you slightly more flexibility for later, when you might want to consider several members...

jbusecke commented 3 years ago

You might need to combine the variables manually too though. This is described in the cmip6_pp docs aswell. Let me know if that works.

jbusecke commented 3 years ago

When I run your code I actually run into an issue with "NorCPM1" first. Did you experience this too?

jbusecke commented 3 years ago

I was able to solve the problem with "NorCPM1" by using use_cftime=True instead of decode_times=False. Is there a particular reason you are not decoding the time?

jbusecke commented 3 years ago

Ok so it seems that there are two separate issues: The NorCPM1 issue seems to work as described above, but the E3SM-1-0 model has some other issues. The underlying problem for that is found higher up in the error message:

ValueError: cannot reindex or align along dimension 'time' because the index has duplicate values

The above exception was the direct cause of the following exception:

I think this is related to https://github.com/jbusecke/cmip6_preprocessing/issues/106. Ill quickly check a few things and get back to you.

When I ran your code just for "E3SM-1-1" I did not get any errors, could you confirm that is the same for you?

jbusecke commented 3 years ago

Yep, as suspected the time in one of the files is messed up. So what I did to diagnose this is the following:

import intake
from cmip6_preprocessing.preprocessing import combined_preprocessing

col = intake.open_esm_datastore(
    "https://storage.googleapis.com/cmip6/pangeo-cmip6.json"
)

variables = ["dissic", "talk", "so", "thetao"]
# z_kwargs = {"consolidated": True, "decode_times": False}
z_kwargs = {"consolidated": True, "use_cftime": True}
query = dict(
    experiment_id=["historical"],
    table_id=["Omon"],
    variable_id=variables,
    grid_label=["gr"],
    source_id=[
#         "GFDL-ESM4",
#         "CESM2-FV2",
#         "CESM2",
#         "MRI-ESM2-0",
#         "CESM2-WACCM-FV2",
#         "GFDL-CM4",
#         "CESM2-WACCM",
#         "E3SM-1-1-ECA",
#         "E3SM-1-1",
        "E3SM-1-0",
#         "NorCPM1",
    ],
)
cat = col.search(**query)
# print(cat.df['source_id'].unique())
dset_dict = cat.to_dataset_dict(
    zarr_kwargs=z_kwargs,
    storage_options={"token": "anon"},
    preprocess=combined_preprocessing,
    aggregate=False
)

Then I quickly plotted the first difference of the time dimension (this should be >0 at all times):

import matplotlib.pyplot as plt

for name, ds in dset_dict.items():
    plt.figure()
    plt.plot(ds.time.diff('time'))
    plt.title(name)

This looks good for all datasets, except for the salinity of one member

image

I am not exactly sure what is going on there (will open a separate issue in the cmip6_preprocessing repo), but for now I would recommend to avoid member "r5... for now.