Closed gleckler1 closed 6 years ago
I'm not convinced though... But hey that will just make it a bit harder for you guys to mix old and new json files...
@doutriaux1 when I sort out #403 we can then bump the PMP version to 2.0
.. json
formats change and this limits backward compatibility
@doutriaux1 its early days yet in terms of backwards compatibility being critical ... if we are going to make structural changes to these jsons now is the time. It will break some plotting codes, but it will be worth it. Masking is critical, and its important to know if we've had to estimate it.
it's ok I thought about it last night and it shouldn't be too bad to keep it compatible one more time.
@doutriaux1 Great... and currently we don't document whether or not we had to estimate the model mask... thus will be good to trap.
@doutriaux1 just spoke with @gleckler1 and he bumped this one up to top priority.. With #423 (nightlies) as next in line
@doutriaux1 @durack1 note the add on info.... keeping track of whether or not a mask needed to be automatically generatted
@gleckler1 in json? Or do you also want some kind of info stored in NC file?
@gleckler1 @doutriaux1 @durack1 following this discussion I might have a feedback related with those issues of land/sea masks. We are currently working with Patricia Cadule from IPSL (she has just started to use the PMP) to compute metrics on the Gross Primary Productivity (gpp). I naturally set "gpp":["land"] in regions:
regions = {"tas" : [None,"land","ocean"],
"uas" : [None,"land","ocean"],
"vas" : [None,"land","ocean"],
"pr" : [None,"land","ocean"],
"psl": [None,"land","ocean"],
"huss": [None,"land","ocean"],
"emp": [None,"land","ocean"],
"prw": [None,"land","ocean"],
"gpp" : ["land"],
}
And here is the error I get:
(PMP_nightly) jservon@ciclad-ng:/data/jservon/Evaluation/PMP_Patricia> pcmdi_metrics_driver.py -p clean_test4_params_gpp_JS.py parameter file ref is: default
ref is: ['default']
couldn't figure out obs mask name from obs json file
REGION: {'id': 'land', 'value': 100}
/home/jservon/anaconda2/envs/PMP_nightly/lib/python2.7/site-packages/cdms2/avariable.py:1102: Warning:
avariable.regrid: We chose regridMethod = linear for you among the following choices:
'conserve' or 'linear' or 'patch'
warnings.warn(message, Warning)
/home/jservon/anaconda2/envs/PMP_nightly/lib/python2.7/site-packages/ESMP/src/ESMP_API.py:1241: FutureWarning: comparison to `None` will result in an elementwise object comparison in the future.
if (srcMaskValues != None):
/home/jservon/anaconda2/envs/PMP_nightly/lib/python2.7/site-packages/ESMP/src/ESMP_API.py:1248: FutureWarning: comparison to `None` will result in an elementwise object comparison in the future.
if (dstMaskValues != None):
^@/home/jservon/anaconda2/envs/PMP_nightly/lib/python2.7/site-packages/cdutil/create_landsea_mask.py:130: MaskedArrayFutureWarning: setting an item on a masked array which has a shared mask will not copy the mask and also change the original mask array in the future.
Check the NumPy 1.11 release notes for more information.
mask2[1:-1,1:-1] = MV2.where(m,1,mask[1:-1,1:-1])
OBS SHAPE IS (12, 72, 144)
Failed to get variable gpp for version: BNU-ESM, error:
Inconsistant shape between the condition and the input (got (1, 73, 96) and (12, 73, 96))
Results saved to JSON file: /data/jservon/Evaluation/PMP_Patricia/metrics_results/gpp_2.5x2.5_esmf_linear_metrics.json
Done. Check log at: /data/jservon/Evaluation/PMP_Patricia/metrics_results/errors_log.txt
We do not provide a mask and set generate_sftlf = True
If we set "gpp":[None] in regions we do not have the error but as far as I understand we rely on the fact that the gpp is originally masked (missing values over the ocean) in both the model and the reference (or at least one of them? Would that be enough?)
Is it an expected behaviour? (and should I move this post to create a new issue?)
I've just checked on other variables (tas, pr...), and I confirm that I can't get either 'ocean' or 'land' (with generate_sftlf = True).
@doutriaux1 @jservonnat Not obvious if its the PMP software or something not quite right in your parameter file. The whole point of the generate_sftlf is to be able to get land or ocean values and its been working fine. Need to track down why its not working for tas and pr before trying a new variable.
I think that might be a numpy 1.11 issue pinging @dnadeau4
@gleckler1 @doutriaux1 Here is the content (without comments) of my param file:
import genutil, os.path, time
from datetime import datetime
import pcmdi_metrics
case_id = 'allcmip5_gpp'
model_versions = [
'BNU-ESM']
period = '1981_2004'
realization = 'r1i1p1'
experiment = 'historical'
generate_sftlf = True
filename_template = "%(model_version)_"+experiment+".mon.land.Lmon.%(realization).%(variable)_%(period)_96x73_cdo_ymonmean.nc"
model_clims_interpolated_output = 'model_clims_interpolated_output/'
filename_output_template = "Metrics_%(model_version)."+experiment+"."+realization+".mo.%(table).%(variable).ver-1.%(period).%(region).AC.nc"
save_mod_clims = False # True or False
mod_data_path = '/data/cpipsl/CMIP5_TRANSPORT/DATA/LAND/gpp/r1i1p1/ymonmean/'
custom_observations='/home/jservon/Evaluation/PCMDI-MP/template_param_file/obs_ipsl_info_dictionary.json'
obs_data_path = '/data/jservon/Evaluation/ReferenceDatasets/PMP_obs/obs'
metrics_output_path = '/data/jservon/Evaluation/PMP_Patricia/metrics_results/'
ref = 'default' #,'alternate1','alternate2']
str_vars = 'gpp'
vars = str.split(str_vars,',')
regions = {"tas" : [None,"land","ocean"],
"uas" : [None,"land","ocean"],
"vas" : [None,"land","ocean"],
"pr" : [None,"land","ocean"],
"psl": [None,"land","ocean"],
"huss": [None,"land","ocean"],
"emp": [None,"land","ocean"],
"prw": [None,"land","ocean"],
"gpp" : [None,"land"],
}
regions_specs = {"Nino34":
{"value": 0.,
"domain": cdutil.region.domain(latitude=(-5., 5., "ccb"), longitude=(190., 240., "ccb"))},
"NAM": {"value": 0.,
"domain": {'latitude': (0., 45.), 'longitude': (210., 310.)},
}
}
targetGrid = '2.5x2.5' # OPTIONS: '2.5x2.5' or an actual cdms2 grid object
regrid_tool = 'esmf' # OPTIONS: 'regrid2','esmf'
regrid_method = 'linear' # OPTIONS: 'linear','conservative', only if tool is esmf
regrid_tool_ocn = 'esmf' # OPTIONS: "regrid2","esmf"
regrid_method_ocn = 'linear' # OPTIONS: 'linear','conservative', only if tool is esmf
generate_sftlf = True
model_tweaks = {}
Do you see something wrong in there?
@doutriaux1 then you suggest to update numpy (or at least force version 1.11)?
no I think it's because we updated to numpy 1.11 you can try to re-install the metrics with:
conda create -n BLAH -c conda-forge -c uvcdat uvcdat-nox numpy=1.9
Let me know
@doutriaux1 Ok I'll do this.
I wiki be looking at it after AGU meeting.
@doutriaux1 I've checked today and it doesn't fix the problem for me (same error). @gleckler1 @durack1 @doutriaux1 you guys do not have the same problem with the latest version of the PMP?
@jservonnat @doutriaux1 JS you've mixed issues... this issue (see origin) is for CD to add a "mask" layer to the mean climate json structure with options "all", "land", "ocean", "land-estimated" or "ocean-estimated" ... depending on whether a automatically generated mask was used... This is labeled at PRIORITY1 ...
With regard to you not being able to use masked data, as you described above, I can't repeat the error... just downloaded latest PMP and its working fine for me. We'll need your codes/data to reproduce. If you are still having an problems with this, please create a separate issue from this one!
@doutriaux1 After getting the PMPParser (but NOT yet the rest of the CDP) into the PMP's master on github this issue (see beginning) is still "Priority 1"
@gleckler1 will work on this today. @zshaheen is dealing with the parser issue.
@doutriaux1 Please this is from Oct and is still "priority 1" ... we need to add a layer to deal with masking just after regions (with options "all", "land", "ocean", "land-estimated" or "ocean-estimated")
Current "json_structure": ["model", "reference", "rip", "region", "statistic", "season"],
CHANGE TO: "json_structure": ["model", "reference", "rip, "region", "mask", "statistic", "season"],
will do today
@gleckler1 looking into this
right now we have regions defined as: "Nino3" should that become{"Nino3":{"ocean":{blah}}}?
what do you want to call the nomask
regions? What about in the future when the code allow to pass values such as 234
or unions of values ['234','235'] what the name of the mask should be?
I'm not sure it's a great idea. "global","ocean","land" at the same level seem to make more sense than {"glob":{"nomask":{},'land':{},'ocean':{}}
Even for special caes like NH i think 'NH','NH_ocean' and 'NH_land' make as much sense as {'NH :{"nomask":{},"land":{},"ocean":{}}
Beside with so many layers assuming with go with he extra "mask" layer, it might be better to structure as: "json_structure": ["model", "reference", "rip, "mask", "region", "statistic", "season"],
hence grouping all "ocean" studies together.
Thoughts?
@doutriaux1 open to suggestions on how best to deal with json syntax. But somehow, we need to trap info to distinguish if a mask was used, and if so, was it from a "predefined" mask (e.g., land/sea), or was it "estimated" via the PMP.
@gleckler1 we have a full section of the JSON that store this information:
"RegionalMasking": {
"Nino34": {
"domain": {
"Nino34": "cdutil.region.domain(latitude=(-5.0, 5.0, 'ccb'),longitude=(190.0, 240.0, 'ccb'))"
},
"id": "Nino34",
"value": 0.0
},
"TROPICS": {
"domain": {
"TROPICS": "cdutil.region.domain(latitude=(-30.0, 30, 'ccb'))"
},
"id": "TROPICS"
},
"global": {
"id": "global"
},
"ocean": {
"id": "ocean",
"value": 0
},
"NHEX": {
"domain": {
"NHEX": "cdutil.region.domain(latitude=(30.0, 90, 'ccb'))"
},
"id": "NHEX"
},
"NAM": {
"domain": {
"latitude": [
0.0,
45.0
],
"longitude": [
210.0,
310.0
]
},
the one information missing here (I can think of) is the path to the mask used (or if it's been self generated) I can add this if you want.
@doutriaux1 OK, agreed, working with this would be better than altering json structure for statistics... how about something like addition for "ocean" below...
"RegionalMasking": { "Nino34": { "domain": { "Nino34": "cdutil.region.domain(latitude=(-5.0, 5.0, 'ccb'),longitude=(190.0, 240.0, 'ccb'))" }, "id": "Nino34", "value": 0.0 }, "TROPICS": { "domain": { "TROPICS": "cdutil.region.domain(latitude=(-30.0, 30, 'ccb'))" }, "id": "TROPICS" }, "global": { "id": "global" }, "ocean": { "id": "ocean", "value": 0 "mask": FILENAME OR ESTIMATED }, "NHEX": { "domain": { "NHEX": "cdutil.region.domain(latitude=(30.0, 90, 'ccb'))" }, "id": "NHEX" }, "NAM": { "domain": { "latitude": [ 0.0, 45.0 ], "longitude": [ 210.0, 310.0 ] },
@gleckler1 yes. The only issue is that "Regional Mask" is at the top level, "hence" for each models it might different, i.e if we have passed a mask path that does not exist for some model then we will end up with a mix of "path" and "Self-generated" masks.
But I'll come up with a solution soon, either I will add a filed "mask" into the regional masking and map each model to its used mask or i'll add it at the model/region level.
Back to you soon on this.
@gleckler1 turns out it is already here!
"InputRegionMD5": "e3b7143d72d6b4824e20dabd80acd869",
"InputClimatologyFileName": "tas_GFDL-ESM2G_Amon_historical_r1i1p1_198501-200512-clim.nc",
"units": "N/A",
"InputClimatologyMD5": "fe00a043c990b2cfd8ff1e2f9e9c4cc0",
"InputRegionFileName": "sftlf_GFDL-ESM2G.nc"
so i guess we are good to close no?
in case it is autogenerated is will be set to nulll
"InputRegionMD5": null,
"InputClimatologyFileName": "tas_GFDL-ESM2G_Amon_piControl_000101-010012-clim01.xml",
"units": "N/A",
"InputClimatologyMD5": "d0263797666880c69f2c4f4e6a929827",
"InputRegionFileName": null
@doutriaux1 OK, will try working with this... have you needed to make any changes or just figured out this is already in place (in master)?
it's in master! I love it when it's that easy! Let's wait to make sure you're happy with it before closing.
@gleckler1 @doutriaux1 sorry for this long period of silence but I'm back on this issue ;). I'm not sure I totally understand where you are right now, but I would like to be sure that the land-sea masking, i.e. land only (land), ocean only (ocean), or all together (global or nomask) is not at the same level in the json structure as the geographical domain (i.e. TROPICS, NHEX, Nino34...). Can you confirm that?
@jservonnat they are at the same level
otherwise some field would be redundant e.g Nino34 would have a useless "ocean" subsection
masking regions are not going to stay just None/land/ocean for ever, I can foresee complicated region selection schemas, so the "Region" name alone should reflect that with the region described in its own special section.
@doutriaux1 I agree with your last point. There should be a way to make the difference between a geographical domain, or 'region', and the fact that we compute a statistic, for instance, on land-only or ocean-only over the same geographical domain. And thus I'm not really convinced by the previous point ;), that is a redundant information when we have Nino34 for the region, and ocean for the mask. It is for sure redundant, but still it is correct. The issue for me is that we do not separate land and ocean on, say, NHEX or TROPICS anymore. We used to do this separation before and I find it clearer.
@doutriaux1 @gleckler1 Concerning the example of the Nino34 region, the default mask should be something like 'nomask' or 'global', which means that no particular masking is applied before the computation of the statistics. From the user point of view, it is not mandatory to specify that Nino34 is ocean only. We only need to specify it when we actually require different masks. In other terms, the user is responsible for masking or not.
@doutriaux I read a second time your last comment ;). So you propose that the name of the region includes the mask, right? Like for instance 'NHEX_land' or 'GLB_ocean'?
@jservonnat @doutriaux1 just to chime in, @gleckler1 and I are working toward (ocean) basin sub regions and it would be very useful to be able to generate a polygon (lat/lon pairs) that we can store in the "regions" file.. These may not be regular shapes..
@durack1 @gleckler1 @doutriaux1 I'm in for any answer to this issue as long as we are able to differentiate between land only and ocean only ;).
@jservonnat @doutriaux1 @durack1 Welcome back JS... we are still here ;-)
A few issues....
We definitely must be able to separate land and ocean... and right now... "yes we can"
For now, we are able to define rectangular domains and then superimpose on them a land or ocean mask. This takes us a long way.
What is also important is that we document whether or not an actual mask was used or was it "estimated" by the PMP/CDAT...
Polygons will be good but for a next iteration.
@gleckler1 and I'm glad you are ;). Ok so if I get the last version of the PMP, I should be able to produce those jsons files with regions and masks separated, right? (if yes I will do that tonight, it's getting late here and I'm starting to feel thirsty...)
@jservonnat rub it in... we're still on tea & coffee. Yes I think regions and masks are separated on latest release, please verify.
I will tonight!
@gleckler1 @doutriaux1 I've get the nightly version of the PMP and it seems that the json structure hasn't changed.
if not args.dry_run:
OUT.write(
metrics_dictionary,
json_structure=["model", "reference", "rip", "region", "statistic", "season"],
mode="w",
indent=4,
separators=(
',',
': '))
I'm up for a visio conference on Thursday if you want to discuss about that directly.
@doutriaux1 @jservonnat JS has convinced PG that we should add a layer to deal with masking just after regions (with options "all", "land", "ocean", "land-estimated" or "ocean-estimated")
Current "json_structure": ["model", "reference", "rip", "region", "statistic", "season"],
CHANGE TO: "json_structure": ["model", "reference", "rip, "region", "mask", "statistic", "season"],