PCMDI / pcmdi_metrics

Open-source Python package for Systematic Evaluation of Climate and Earth System Models
http://pcmdi.github.io/pcmdi_metrics/
BSD 3-Clause "New" or "Revised" License
97 stars 37 forks source link

modfiy json class output to better deal with masked data #443

Closed gleckler1 closed 6 years ago

gleckler1 commented 7 years ago

@doutriaux1 @jservonnat JS has convinced PG that we should add a layer to deal with masking just after regions (with options "all", "land", "ocean", "land-estimated" or "ocean-estimated")

Current "json_structure": ["model", "reference", "rip", "region", "statistic", "season"],

CHANGE TO: "json_structure": ["model", "reference", "rip, "region", "mask", "statistic", "season"],

doutriaux1 commented 7 years ago

I'm not convinced though... But hey that will just make it a bit harder for you guys to mix old and new json files...

durack1 commented 7 years ago

@doutriaux1 when I sort out #403 we can then bump the PMP version to 2.0.. json formats change and this limits backward compatibility

gleckler1 commented 7 years ago

@doutriaux1 its early days yet in terms of backwards compatibility being critical ... if we are going to make structural changes to these jsons now is the time. It will break some plotting codes, but it will be worth it. Masking is critical, and its important to know if we've had to estimate it.

doutriaux1 commented 7 years ago

it's ok I thought about it last night and it shouldn't be too bad to keep it compatible one more time.

gleckler1 commented 7 years ago

@doutriaux1 Great... and currently we don't document whether or not we had to estimate the model mask... thus will be good to trap.

durack1 commented 7 years ago

@doutriaux1 just spoke with @gleckler1 and he bumped this one up to top priority.. With #423 (nightlies) as next in line

gleckler1 commented 7 years ago

@doutriaux1 @durack1 note the add on info.... keeping track of whether or not a mask needed to be automatically generatted

doutriaux1 commented 7 years ago

@gleckler1 in json? Or do you also want some kind of info stored in NC file?

jservonnat commented 7 years ago

@gleckler1 @doutriaux1 @durack1 following this discussion I might have a feedback related with those issues of land/sea masks. We are currently working with Patricia Cadule from IPSL (she has just started to use the PMP) to compute metrics on the Gross Primary Productivity (gpp). I naturally set "gpp":["land"] in regions:

regions = {"tas" : [None,"land","ocean"],
           "uas" : [None,"land","ocean"],
           "vas" : [None,"land","ocean"],
           "pr" :  [None,"land","ocean"],
           "psl":  [None,"land","ocean"],
           "huss": [None,"land","ocean"],
           "emp":  [None,"land","ocean"],
           "prw":  [None,"land","ocean"],
           "gpp" : ["land"],
          }

And here is the error I get:

(PMP_nightly) jservon@ciclad-ng:/data/jservon/Evaluation/PMP_Patricia> pcmdi_metrics_driver.py -p clean_test4_params_gpp_JS.py parameter file ref is:  default
 ref is:  ['default']
 couldn't figure out obs mask name from obs json file
 REGION: {'id': 'land', 'value': 100}
/home/jservon/anaconda2/envs/PMP_nightly/lib/python2.7/site-packages/cdms2/avariable.py:1102: Warning:
avariable.regrid: We chose regridMethod = linear for you among the following choices:
    'conserve' or 'linear' or 'patch'
  warnings.warn(message, Warning)
/home/jservon/anaconda2/envs/PMP_nightly/lib/python2.7/site-packages/ESMP/src/ESMP_API.py:1241: FutureWarning: comparison to `None` will result in an elementwise object comparison in the future.
  if (srcMaskValues != None):
/home/jservon/anaconda2/envs/PMP_nightly/lib/python2.7/site-packages/ESMP/src/ESMP_API.py:1248: FutureWarning: comparison to `None` will result in an elementwise object comparison in the future.
  if (dstMaskValues != None):
^@/home/jservon/anaconda2/envs/PMP_nightly/lib/python2.7/site-packages/cdutil/create_landsea_mask.py:130: MaskedArrayFutureWarning: setting an item on a masked array which has a shared mask will not copy the mask and also change the original mask array in the future.
Check the NumPy 1.11 release notes for more information.
  mask2[1:-1,1:-1] = MV2.where(m,1,mask[1:-1,1:-1])
 OBS SHAPE IS  (12, 72, 144)
 Failed to get variable gpp for version: BNU-ESM, error:
Inconsistant shape between the condition and the input (got (1, 73, 96) and (12, 73, 96))
Results saved to JSON file: /data/jservon/Evaluation/PMP_Patricia/metrics_results/gpp_2.5x2.5_esmf_linear_metrics.json
 Done. Check log at: /data/jservon/Evaluation/PMP_Patricia/metrics_results/errors_log.txt

We do not provide a mask and set generate_sftlf = True

If we set "gpp":[None] in regions we do not have the error but as far as I understand we rely on the fact that the gpp is originally masked (missing values over the ocean) in both the model and the reference (or at least one of them? Would that be enough?)

Is it an expected behaviour? (and should I move this post to create a new issue?)

jservonnat commented 7 years ago

I've just checked on other variables (tas, pr...), and I confirm that I can't get either 'ocean' or 'land' (with generate_sftlf = True).

gleckler1 commented 7 years ago

@doutriaux1 @jservonnat Not obvious if its the PMP software or something not quite right in your parameter file. The whole point of the generate_sftlf is to be able to get land or ocean values and its been working fine. Need to track down why its not working for tas and pr before trying a new variable.

doutriaux1 commented 7 years ago

I think that might be a numpy 1.11 issue pinging @dnadeau4

jservonnat commented 7 years ago

@gleckler1 @doutriaux1 Here is the content (without comments) of my param file:

import  genutil, os.path, time
from datetime import datetime
import pcmdi_metrics

case_id = 'allcmip5_gpp'
model_versions = [
    'BNU-ESM']

period = '1981_2004'
realization = 'r1i1p1'
experiment = 'historical'

generate_sftlf = True

filename_template = "%(model_version)_"+experiment+".mon.land.Lmon.%(realization).%(variable)_%(period)_96x73_cdo_ymonmean.nc"

model_clims_interpolated_output = 'model_clims_interpolated_output/'
filename_output_template = "Metrics_%(model_version)."+experiment+"."+realization+".mo.%(table).%(variable).ver-1.%(period).%(region).AC.nc"

save_mod_clims = False  # True or False
mod_data_path = '/data/cpipsl/CMIP5_TRANSPORT/DATA/LAND/gpp/r1i1p1/ymonmean/'
custom_observations='/home/jservon/Evaluation/PCMDI-MP/template_param_file/obs_ipsl_info_dictionary.json'
obs_data_path = '/data/jservon/Evaluation/ReferenceDatasets/PMP_obs/obs'

metrics_output_path = '/data/jservon/Evaluation/PMP_Patricia/metrics_results/'

ref = 'default'  #,'alternate1','alternate2']

str_vars = 'gpp'
vars = str.split(str_vars,',')

regions = {"tas" : [None,"land","ocean"],
           "uas" : [None,"land","ocean"],
           "vas" : [None,"land","ocean"],
           "pr" :  [None,"land","ocean"],
           "psl":  [None,"land","ocean"],
           "huss": [None,"land","ocean"],
           "emp":  [None,"land","ocean"],
           "prw":  [None,"land","ocean"],
           "gpp" : [None,"land"],
          }

regions_specs = {"Nino34":
                 {"value": 0.,
                  "domain": cdutil.region.domain(latitude=(-5., 5., "ccb"), longitude=(190., 240., "ccb"))},
                 "NAM": {"value": 0.,
                         "domain": {'latitude': (0., 45.), 'longitude': (210., 310.)},
                         }
                 }

targetGrid        = '2.5x2.5' # OPTIONS: '2.5x2.5' or an actual cdms2 grid object
regrid_tool       = 'esmf' # OPTIONS: 'regrid2','esmf'
regrid_method     = 'linear'  # OPTIONS: 'linear','conservative', only if tool is esmf
regrid_tool_ocn   = 'esmf'    # OPTIONS: "regrid2","esmf"
regrid_method_ocn = 'linear'  # OPTIONS: 'linear','conservative', only if tool is esmf

generate_sftlf = True

model_tweaks = {}

Do you see something wrong in there?

jservonnat commented 7 years ago

@doutriaux1 then you suggest to update numpy (or at least force version 1.11)?

doutriaux1 commented 7 years ago

no I think it's because we updated to numpy 1.11 you can try to re-install the metrics with: conda create -n BLAH -c conda-forge -c uvcdat uvcdat-nox numpy=1.9

Let me know

jservonnat commented 7 years ago

@doutriaux1 Ok I'll do this.

dnadeau4 commented 7 years ago

I wiki be looking at it after AGU meeting.

jservonnat commented 7 years ago

@doutriaux1 I've checked today and it doesn't fix the problem for me (same error). @gleckler1 @durack1 @doutriaux1 you guys do not have the same problem with the latest version of the PMP?

gleckler1 commented 7 years ago

@jservonnat @doutriaux1 JS you've mixed issues... this issue (see origin) is for CD to add a "mask" layer to the mean climate json structure with options "all", "land", "ocean", "land-estimated" or "ocean-estimated" ... depending on whether a automatically generated mask was used... This is labeled at PRIORITY1 ...

With regard to you not being able to use masked data, as you described above, I can't repeat the error... just downloaded latest PMP and its working fine for me. We'll need your codes/data to reproduce. If you are still having an problems with this, please create a separate issue from this one!

gleckler1 commented 7 years ago

@doutriaux1 After getting the PMPParser (but NOT yet the rest of the CDP) into the PMP's master on github this issue (see beginning) is still "Priority 1"

doutriaux1 commented 7 years ago

@gleckler1 will work on this today. @zshaheen is dealing with the parser issue.

gleckler1 commented 7 years ago

@doutriaux1 Please this is from Oct and is still "priority 1" ... we need to add a layer to deal with masking just after regions (with options "all", "land", "ocean", "land-estimated" or "ocean-estimated")

Current "json_structure": ["model", "reference", "rip", "region", "statistic", "season"],

CHANGE TO: "json_structure": ["model", "reference", "rip, "region", "mask", "statistic", "season"],

doutriaux1 commented 7 years ago

will do today

doutriaux1 commented 7 years ago

@gleckler1 looking into this right now we have regions defined as: "Nino3" should that become{"Nino3":{"ocean":{blah}}}?

doutriaux1 commented 7 years ago

what do you want to call the nomask regions? What about in the future when the code allow to pass values such as 234 or unions of values ['234','235'] what the name of the mask should be?

doutriaux1 commented 7 years ago

I'm not sure it's a great idea. "global","ocean","land" at the same level seem to make more sense than {"glob":{"nomask":{},'land':{},'ocean':{}}

Even for special caes like NH i think 'NH','NH_ocean' and 'NH_land' make as much sense as {'NH :{"nomask":{},"land":{},"ocean":{}}

Beside with so many layers assuming with go with he extra "mask" layer, it might be better to structure as: "json_structure": ["model", "reference", "rip, "mask", "region", "statistic", "season"],

hence grouping all "ocean" studies together.

Thoughts?

gleckler1 commented 7 years ago

@doutriaux1 open to suggestions on how best to deal with json syntax. But somehow, we need to trap info to distinguish if a mask was used, and if so, was it from a "predefined" mask (e.g., land/sea), or was it "estimated" via the PMP.

doutriaux1 commented 7 years ago

@gleckler1 we have a full section of the JSON that store this information:

    "RegionalMasking": {
        "Nino34": {
            "domain": {
                "Nino34": "cdutil.region.domain(latitude=(-5.0, 5.0, 'ccb'),longitude=(190.0, 240.0, 'ccb'))"
            },
            "id": "Nino34",
            "value": 0.0
        },
        "TROPICS": {
            "domain": {
                "TROPICS": "cdutil.region.domain(latitude=(-30.0, 30, 'ccb'))"
            },
            "id": "TROPICS"
        },
        "global": {
            "id": "global"
        },
        "ocean": {
            "id": "ocean",
            "value": 0
        },
        "NHEX": {
            "domain": {
                "NHEX": "cdutil.region.domain(latitude=(30.0, 90, 'ccb'))"
            },
            "id": "NHEX"
        },
        "NAM": {
            "domain": {
                "latitude": [
                    0.0,
                    45.0
                ],
                "longitude": [
                    210.0,
                    310.0
                ]
            },
doutriaux1 commented 7 years ago

the one information missing here (I can think of) is the path to the mask used (or if it's been self generated) I can add this if you want.

gleckler1 commented 7 years ago

@doutriaux1 OK, agreed, working with this would be better than altering json structure for statistics... how about something like addition for "ocean" below...

"RegionalMasking": { "Nino34": { "domain": { "Nino34": "cdutil.region.domain(latitude=(-5.0, 5.0, 'ccb'),longitude=(190.0, 240.0, 'ccb'))" }, "id": "Nino34", "value": 0.0 }, "TROPICS": { "domain": { "TROPICS": "cdutil.region.domain(latitude=(-30.0, 30, 'ccb'))" }, "id": "TROPICS" }, "global": { "id": "global" }, "ocean": { "id": "ocean", "value": 0 "mask": FILENAME OR ESTIMATED }, "NHEX": { "domain": { "NHEX": "cdutil.region.domain(latitude=(30.0, 90, 'ccb'))" }, "id": "NHEX" }, "NAM": { "domain": { "latitude": [ 0.0, 45.0 ], "longitude": [ 210.0, 310.0 ] },

doutriaux1 commented 7 years ago

@gleckler1 yes. The only issue is that "Regional Mask" is at the top level, "hence" for each models it might different, i.e if we have passed a mask path that does not exist for some model then we will end up with a mix of "path" and "Self-generated" masks.

But I'll come up with a solution soon, either I will add a filed "mask" into the regional masking and map each model to its used mask or i'll add it at the model/region level.

Back to you soon on this.

doutriaux1 commented 7 years ago

@gleckler1 turns out it is already here!

            "InputRegionMD5": "e3b7143d72d6b4824e20dabd80acd869",
            "InputClimatologyFileName": "tas_GFDL-ESM2G_Amon_historical_r1i1p1_198501-200512-clim.nc",
            "units": "N/A",
            "InputClimatologyMD5": "fe00a043c990b2cfd8ff1e2f9e9c4cc0",
            "InputRegionFileName": "sftlf_GFDL-ESM2G.nc"
doutriaux1 commented 7 years ago

so i guess we are good to close no?

doutriaux1 commented 7 years ago

in case it is autogenerated is will be set to nulll

            "InputRegionMD5": null,
            "InputClimatologyFileName": "tas_GFDL-ESM2G_Amon_piControl_000101-010012-clim01.xml",
            "units": "N/A",
            "InputClimatologyMD5": "d0263797666880c69f2c4f4e6a929827",
            "InputRegionFileName": null
gleckler1 commented 7 years ago

@doutriaux1 OK, will try working with this... have you needed to make any changes or just figured out this is already in place (in master)?

doutriaux1 commented 7 years ago

it's in master! I love it when it's that easy! Let's wait to make sure you're happy with it before closing.

jservonnat commented 7 years ago

@gleckler1 @doutriaux1 sorry for this long period of silence but I'm back on this issue ;). I'm not sure I totally understand where you are right now, but I would like to be sure that the land-sea masking, i.e. land only (land), ocean only (ocean), or all together (global or nomask) is not at the same level in the json structure as the geographical domain (i.e. TROPICS, NHEX, Nino34...). Can you confirm that?

doutriaux1 commented 7 years ago

@jservonnat they are at the same level

doutriaux1 commented 7 years ago

otherwise some field would be redundant e.g Nino34 would have a useless "ocean" subsection

doutriaux1 commented 7 years ago

masking regions are not going to stay just None/land/ocean for ever, I can foresee complicated region selection schemas, so the "Region" name alone should reflect that with the region described in its own special section.

jservonnat commented 7 years ago

@doutriaux1 I agree with your last point. There should be a way to make the difference between a geographical domain, or 'region', and the fact that we compute a statistic, for instance, on land-only or ocean-only over the same geographical domain. And thus I'm not really convinced by the previous point ;), that is a redundant information when we have Nino34 for the region, and ocean for the mask. It is for sure redundant, but still it is correct. The issue for me is that we do not separate land and ocean on, say, NHEX or TROPICS anymore. We used to do this separation before and I find it clearer.

jservonnat commented 7 years ago

@doutriaux1 @gleckler1 Concerning the example of the Nino34 region, the default mask should be something like 'nomask' or 'global', which means that no particular masking is applied before the computation of the statistics. From the user point of view, it is not mandatory to specify that Nino34 is ocean only. We only need to specify it when we actually require different masks. In other terms, the user is responsible for masking or not.

jservonnat commented 7 years ago

@doutriaux I read a second time your last comment ;). So you propose that the name of the region includes the mask, right? Like for instance 'NHEX_land' or 'GLB_ocean'?

durack1 commented 7 years ago

@jservonnat @doutriaux1 just to chime in, @gleckler1 and I are working toward (ocean) basin sub regions and it would be very useful to be able to generate a polygon (lat/lon pairs) that we can store in the "regions" file.. These may not be regular shapes..

jservonnat commented 7 years ago

@durack1 @gleckler1 @doutriaux1 I'm in for any answer to this issue as long as we are able to differentiate between land only and ocean only ;).

gleckler1 commented 7 years ago

@jservonnat @doutriaux1 @durack1 Welcome back JS... we are still here ;-) A few issues.... We definitely must be able to separate land and ocean... and right now... "yes we can" For now, we are able to define rectangular domains and then superimpose on them a land or ocean mask. This takes us a long way.
What is also important is that we document whether or not an actual mask was used or was it "estimated" by the PMP/CDAT... Polygons will be good but for a next iteration.

jservonnat commented 7 years ago

@gleckler1 and I'm glad you are ;). Ok so if I get the last version of the PMP, I should be able to produce those jsons files with regions and masks separated, right? (if yes I will do that tonight, it's getting late here and I'm starting to feel thirsty...)

gleckler1 commented 7 years ago

@jservonnat rub it in... we're still on tea & coffee. Yes I think regions and masks are separated on latest release, please verify.

jservonnat commented 7 years ago

I will tonight!

jservonnat commented 7 years ago

@gleckler1 @doutriaux1 I've get the nightly version of the PMP and it seems that the json structure hasn't changed.

        if not args.dry_run:
            OUT.write(
                metrics_dictionary,
                json_structure=["model", "reference", "rip", "region", "statistic", "season"],
                mode="w",
                indent=4,
                separators=(
                    ',',
                    ': '))

I'm up for a visio conference on Thursday if you want to discuss about that directly.