WCRP-CORDEX / data-request-table

Machine readable data request tables
MIT License
0 stars 0 forks source link

update surface and soil cell methods #28

Closed larsbuntemeyer closed 6 months ago

larsbuntemeyer commented 6 months ago

closes #26

Update cell methods of surface variables to mean where land by comparing to CMIP6. In CMIP6, it depends on the table actually, so different variables might have different area cell methods depending on their MIP. For CORDEX, i excluded clt as an exception (total cloud cover) since it does not make sense in CORDEX to restrict this to land...

import pandas as pd
import re
import yaml

def parse_cell_methods(cm_string):
    # https://stackoverflow.com/questions/52340963/how-to-insert-a-newline-character-before-a-words-that-contains-a-colon
    ys = re.sub(r"(\w+):", r"\n\1:", cm_string).strip()
    d = yaml.safe_load(ys)
    if "area" in d and d.get("area") is None:
        d["area"] = d["time"]
    return d

def encode_cell_methods(cell_methods):
    return " ".join([f"{k}: {v}" for k, v in cell_methods.items()])

def update_cell_methods(cell_methods):
    cm = parse_cell_methods(cell_methods)
    cm["area"] = "mean where land"
    return encode_cell_methods(cm)

and

exceptions = ["clt", "evspsbl", "hfls", "hfss"] # 
cmip6 = cmip6[cmip6['cell_methods'].notna()]
mean_where_land = list(cmip6[(~cmip6.out_name.isin(exceptions)) & (cmip6.cell_methods.str.contains("mean where land"))].out_name.unique())

df = pd.read_csv("CORDEX-CMIP6/data-request.csv")
cond = df.out_name.isin(mean_where_land)

df.loc[cond, "cell_methods"] = df[cond].cell_methods.apply(update_cell_methods)
larsbuntemeyer commented 6 months ago

@gnikulin the following variables will have area: mean where land:

['evspsbl',
 'tsl',
 'mrros',
 'mrro',
 'snm',
 'hfls',
 'hfss',
 'mrfso',
 'mrsfl',
 'mrso',
 'mrsos',
 'mrsol',
 'snw',
 'snd',
 'evspsblpot',
 'mrsofc']
gnikulin commented 6 months ago

I think we need to exclude several other variables e.g. evspsbl, will check other vars.

larsbuntemeyer commented 6 months ago

I think it's not too bad, potential evapotranspiration only makes sense above land surfaces as far as a quick check turned out. Also the rest of the variables either relate to land, surface, soil or snow.

gnikulin commented 6 months ago

Yes, potential evapotranspiration only makes sense above land surfaces while evspsbl in general make sense above both water and land surfaces. We need to check all snow-related variables, e.g. snow on ice (sea and lake) may also make sense. hfls and hfss in the Amon table have "cell_methods": "area: time: mean".

gnikulin commented 6 months ago

We need to exclude evspsbl, hflsand hfss, other variables should have area: mean where land

larsbuntemeyer commented 6 months ago

agreed, list is updated:

['tsl',
 'mrros',
 'mrro',
 'snm',
 'mrfso',
 'mrsfl',
 'mrso',
 'mrsos',
 'mrsol',
 'snw',
 'snd',
 'evspsblpot',
 'mrsofc']