WCRP-CORDEX / data-request-table

Machine readable data request tables
MIT License
0 stars 0 forks source link

update comments using CMIP6 comments where possible #23

Closed larsbuntemeyer closed 5 months ago

larsbuntemeyer commented 6 months ago

closes #20

uses


import pandas as pd

def retrieve_cmip6_mip_tables():
    """retrieve and concat all cmip6 mip tables from
    https://c6dreq.dkrz.de/docs/CMIP6_MIP_tables.xlsx
    """
    cols = [
        "frequency",
        "modeling_realm",
        "standard_name",
        "units",
        "cell_methods",
        "cell_measures",
        "long_name",
        "comment",
        "dimensions",
        "out_name",
        "type",
        "positive",
        "valid_min",
        "valid_max",
        "ok_min_mean_abs",
        "ok_max_mean_abs",
        "cmip6_table",
    ]
    cmip6_mip_tables_url = "https://c6dreq.dkrz.de/docs/CMIP6_MIP_tables.xlsx"
    tables = pd.read_excel(cmip6_mip_tables_url, sheet_name=None)
    del tables["Notes"]

    def add_table_name(df, table):
        df["cmip6_table"] = table
        return df

    df = pd.concat(add_table_name(df, table) for table, df in tables.items())
    df.rename(
        columns={
            "CF Standard Name": "standard_name",
            "Long name": "long_name",
            "Variable Name": "out_name",
        },
        inplace=True,
    )
    return df[cols].drop_duplicates(ignore_index=True)

df = pd.read_csv("CORDEX-CMIP6/data-request.csv")
cols = df.columns.to_list()

cmip6 = retrieve_cmip6_mip_tables()
cmip6 = cmip6[~cmip6.out_name.isin(["ps", "hus850", "pr", "prc"])] # ps has ambigious comments
cmip6["frequency"] = cmip6.frequency.str.strip('Pt')
cmip6_comments = cmip6.set_index(["out_name", "frequency", "standard_name", "cell_methods", "long_name"])[['comment']]
#cmip6[cmip6.out_name== "hus850"]

pd.set_option('display.max_rows', None)
df = df.set_index(["out_name", "frequency", "standard_name", "cell_methods", "long_name"]).join(cmip6_comments, rsuffix="_cmip6")
df["comment"] = df.comment_cmip6
df = df.drop(columns="comment_cmip6").reset_index()
#df.to_csv()
df[cols].to_csv("CORDEX-CMIP6/data-request.csv", index=False)
larsbuntemeyer commented 6 months ago

this needs more investigation...

larsbuntemeyer commented 5 months ago

@gnikulin This PR would update the variable comments with comments from CMIP6. However, in CMIP6, the comment depends on the table you choose. However, i think the comments from the original CORDEX data request were not supposed to go into the tables anyway, so i would remove them. But could you have look, which variable comments you would like to keep? I think for some, it makes sense (e.g. 'tasmax', 'tasmin', ...).

gnikulin commented 5 months ago

Almost all comments describing variables in the original CORDEX DR should be included in the CORDEX-CMIP6 CMOR tables. They better describe different aspects of variables compared to CMIP6. Some of them are based on the CMIP6 tables but with additional explanations. Of course we don't need comments like "daily and monthly means" as for siconca and comments about changes.

Would it be possible to copy comments from CMIP6 for all variables that don't have comments right now ?

gnikulin commented 5 months ago

if the comment depends on the table, we need to check what table should be used

larsbuntemeyer commented 5 months ago

Ok, we can keep the comments from the tables on the CORDEX data request homepage, it just seemed to me that comments like, e.g., requested for urban modeling for ta50m were very arbitrary (and not meant to end up in the NetCDF file)...

Another example would be to replace the CORDEX data request comment on od550aer which currently is "long_name fixed, 2022.09.22 https://github.com/WCRP-CORDEX/cordex-cmip6-data-request/issues/6" while the CMIP6 comment "AOD from ambient aerosols (i.e., includes aerosol water). Does not include AOD from stratospheric aerosols if these are prescribed but includes other possible background aerosol types. Needs a comment attribute ""wavelength: 550 nm"" seems to make more sense.

However, this would probably have to be figured out per variable....

gnikulin commented 5 months ago

Of course, the current comments for ta50 and od550aer must be deleted because they are internal comments not describing the variables. Here, we can only figure out per variable.

larsbuntemeyer commented 5 months ago

I'll close this PR then, i'll have to think of something new...