EC-Earth / ece2cmor3

Post-processing and cmorization of ec-earth output
Apache License 2.0
13 stars 6 forks source link

DynvarMIP specific variables #670

Closed treerink closed 3 years ago

treerink commented 3 years ago

This issue is related to ec-earth portal issue 898.

For some zonal mean (wave atmos TEM) variables Federico Serva has written a python script to compute these results. See issue 1 in that repository.

What is needed here, is that ece2cmor allows an option in which a (python) script is called instead of the calls to cdo. We aim for an interface via the ifspar.json. An additional option, like the expression option, should allow the external script call.

treerink commented 3 years ago

As a start @goord will investigate in which way such an interface can be created. So in a branch we will just try to create this kind of infrastructure and as a test we take a little script which just does the same cdo as which is used currently.

fserva commented 3 years ago

Hi, as noted in the other script, the external script computes:

The variables in table 126 (from 120 to 132) are tendencies, and as such they should be in ACCUMFLD in grib_codes.json (I guess)

fserva commented 3 years ago

Another point is that in cdoapi.py I don't see: zonmean, enlarge, mul, sub, griddes (with txt in output).

nco is not available in the ece2cmor3 environment right?

treerink commented 3 years ago

@fserva we have been looking into one of the examples: EmonZ utendepfd and as you can see the data request asks this variable on 39 pressure levels (plev39). The plev39 coordinate from the table looks like:

        "plev39": {
            "standard_name": "air_pressure", 
            "units": "Pa", 
            "axis": "Z", 
            "long_name": "pressure", 
            "climatology": "", 
            "formula": "", 
            "must_have_bounds": "no", 
            "out_name": "plev", 
            "positive": "down", 
            "requested": [
                "100000.", 
                "92500.", 
                "85000.", 
                "70000.", 
                "60000.", 
                "50000.", 
                "40000.", 
                "30000.", 
                "25000.", 
                "20000.", 
                "17000.", 
                "15000.", 
                "13000.", 
                "11500.", 
                "10000.", 
                "9000.", 
                "8000.", 
                "7000.", 
                "5000.", 
                "3000.", 
                "2000.", 
                "1500.", 
                "1000.", 
                "700.", 
                "500.", 
                "300.", 
                "200.", 
                "150.", 
                "100.", 
                "70.", 
                "50.", 
                "40.", 
                "30.", 
                "20.", 
                "15.", 
                "10.", 
                "7.", 
                "5.", 
                "3."
            ], 
            "requested_bounds": "", 
            "stored_direction": "decreasing", 
            "tolerance": "", 
            "type": "double", 
            "valid_max": "", 
            "valid_min": "", 
            "value": "", 
            "z_bounds_factors": "", 
            "z_factors": "", 
            "bounds_values": "", 
            "generic_level_name": ""
        }, 

So what is actually the reason that you request this variable on model levels and not directly on the requested pressure levels? If I would ask this variable to a DynvarMIP request and with that run our genecec tool, that is what would happen, the (missing) plev39 levels will be added to the ppt files.

fserva commented 3 years ago

Hi, the 8 variables mentioned here are not available as model output. They need to be computed offline, but this requires computing vertical derivatives. Since increased vertical resolution reduces the numerical error, all 91 levels are used for the computation. Then subsampling of these variables needs to be done. I understand genecec would not be needed in this case right?

fserva commented 3 years ago

The variables defined in table 126 are instead requested on the reduced set of pressure levels you are referring to.

fserva commented 3 years ago

Hello, I just noticed something with the units of the wind tendencies that should be computed offline.

In https://github.com/PCMDI/cmip6-cmor-tables/blob/master/Tables/CMIP6_EdayZ.json, for example utendvtem, it reports:

"utendvtem": {
            "frequency": "day", 
            "modeling_realm": "atmos", 
            "standard_name": "tendency_of_eastward_wind_due_to_advection_by_northward_transformed_eulerian_mean_air_velocity", 
            "units": "m s-1 d-1", 
            "cell_methods": "longitude: mean time: mean", 
            "cell_measures": "", 
            "long_name": "Tendency of Eastward Wind Due to TEM Northward Advection and Coriolis Term", 
            "comment": "Tendency of zonally averaged eastward wind, by the residual northward wind advection (on the native model grid). Reference: Andrews et al (1987): Middle Atmospheric Dynamics. Academic Press.", 
            "dimensions": "latitude plev39 time", 
            "out_name": "utendvtem", 
            "type": "real", 
            "positive": "", 
            "valid_min": "", 
            "valid_max": "", 
            "ok_min_mean_abs": "", 
            "ok_max_mean_abs": ""
        }, 

and same here http://clipc-services.ceda.ac.uk/dreq/01.00.21/u/DynVar.html.

However, the reference paper for this MIP asks for units m s-2, as in most cases I suppose.

The difference is the factor 86400. I've checked a couple of models and they have units of m/s/d.

What should have the precedence for this? @treerink @goord

treerink commented 3 years ago

I have merged the master into the dynvar-tem-script branch and had to resolve two conflicts: one in the ifspar.json and one in the the taskloader.py.

@goord can you review the merged code below in the taskloader.py:

    model_vars = load_model_vars()
    load_masks(model_vars)
    load_scripts(model_vars)
    masks = load_masks(load_model_vars())
    return create_tasks(filtered_matches, get_models(active_components), masks=masks)

I wondered whether the recently introduced masks (for the automatic masked regridding for masked fields) needs any further connection with the load scripts. I actually thought this is disconnected as this part is fully covered by the called script in that case, but the added 'load_masks(model_vars)' in this branch did confuse me a bit. So asking for a check.

goord commented 3 years ago

Hello, I just noticed something with the units of the wind tendencies that should be computed offline.

In https://github.com/PCMDI/cmip6-cmor-tables/blob/master/Tables/CMIP6_EdayZ.json, for example utendvtem, it reports:

"utendvtem": {
            "frequency": "day", 
            "modeling_realm": "atmos", 
            "standard_name": "tendency_of_eastward_wind_due_to_advection_by_northward_transformed_eulerian_mean_air_velocity", 
            "units": "m s-1 d-1", 
            "cell_methods": "longitude: mean time: mean", 
            "cell_measures": "", 
            "long_name": "Tendency of Eastward Wind Due to TEM Northward Advection and Coriolis Term", 
            "comment": "Tendency of zonally averaged eastward wind, by the residual northward wind advection (on the native model grid). Reference: Andrews et al (1987): Middle Atmospheric Dynamics. Academic Press.", 
            "dimensions": "latitude plev39 time", 
            "out_name": "utendvtem", 
            "type": "real", 
            "positive": "", 
            "valid_min": "", 
            "valid_max": "", 
            "ok_min_mean_abs": "", 
            "ok_max_mean_abs": ""
        }, 

and same here http://clipc-services.ceda.ac.uk/dreq/01.00.21/u/DynVar.html.

However, the reference paper for this MIP asks for units m s-2, as in most cases I suppose.

The difference is the factor 86400. I've checked a couple of models and they have units of m/s/d.

What should have the precedence for this? @treerink @goord

@fserva, your script will have to output the velocity tendencies in m/s/d, since that is the unit ec2cmor3 will attach to the variable

goord commented 3 years ago

I have merged the master into the dynvar-tem-script branch and had to resolve two conflicts: one in the ifspar.json and one in the the taskloader.py.

@goord can you review the merged code below in the taskloader.py:

    model_vars = load_model_vars()
    load_masks(model_vars)
    load_scripts(model_vars)
    masks = load_masks(load_model_vars())
    return create_tasks(filtered_matches, get_models(active_components), masks=masks)

I wondered whether the recently introduced masks (for the automatic masked regridding for masked fields) needs any further connection with the load scripts. I actually thought this is disconnected as this part is fully covered by the called script in that case, but the added 'load_masks(model_vars)' in this branch did confuse me a bit. So asking for a check.

Fixed the merged changes in rev. 97f2cf6bb7c6544de5b6c60da7b54e5c0414293a

fserva commented 3 years ago

I will change the units then. However this looks strange since for example utendepfd is m s-2 while only these two variables are m s-1 d-1, which would complicate combining them. Thanks for the tip

goord commented 3 years ago

Hi @fserva I have committed a dummy script make_dynvars.sh in the scripts directory of the dynvar-tem-script branch. This script is called from ece2cmor3 because it has been referenced in the file ifspar.json:

    {
        "source": "215.129",
        "target": "epfy",
        "post-proc": "make_dynvars"
    },
...
    {
        "script": "make_dynvars",
        "src": "make_dynvars.sh",
        "filter": "False"
    }

You can test your script by changing the src field above by the name of your wrapper script. To run, make sure your script is in your PATH. Then you can add all the variables produced by the script to ifspar.json in the same way I defined epfy above, with the same script field. Now ece2cmor3 will call your script whenever these variables are in a data request.

fserva commented 3 years ago

Thanks a lot, I will check this. I imagine that other variables should be 'sourced' sequentially, like 216.129 ... up to 220.129 or something like that. In case I will make a PR if I manage

goord commented 3 years ago

right, you need to 'invent' unused grib codes for these new derived variables. It doesn't really matter which ones, as long as they are unique

fserva commented 3 years ago

Hello, I am testing this and I have two questions @goord.

First, as far as I understand 'external' scripts are to be found in ece2cmor3/scripts/. Can they also be nested there, i.e. can I have a folder (of a certain repository) and the scripts inside that? Secondly, it seems I cannot push a remote branch on ece2cmor3 (permission denied). Is there another way I could share my changes? Thanks a lot

treerink commented 3 years ago

First, as far as I understand 'external' scripts are to be found in ece2cmor3/scripts/. Can they also be nested there, i.e. can I have a folder (of a certain repository) and the scripts inside that?

Yes you can make a dynvar-scripts dir in the scripts dir in your branch

Secondly, it seems I cannot push a remote branch on ece2cmor3 (permission denied). Is there another way I could share my changes? Thanks a lot

I will have a look what your permissions are.

treerink commented 3 years ago

I have sent you a member invitation, you probably first have to confirm by clicking some identification link I guess.

goord commented 3 years ago

Hello, I am testing this and I have two questions @goord.

First, as far as I understand 'external' scripts are to be found in ece2cmor3/scripts/. Can they also be nested there, i.e. can I have a folder (of a certain repository) and the scripts inside that? Secondly, it seems I cannot push a remote branch on ece2cmor3 (permission denied). Is there another way I could share my changes? Thanks a lot

Hi @fserva it's not necessary to add you script to the repo, if you can add it to your PATH before launching ece2cmor3, than it should be found automatically.

fserva commented 3 years ago

Right, you already told me about this! Thanks for the reminder.

goord commented 3 years ago

We should think about documenting this appropriately for others BTW

fserva commented 3 years ago

Absolutely, I am taking a note in my script but perhaps this could be reported in the README file also here

fserva commented 3 years ago

It seems nco was not added yet in the environment. Should this be done? Also scipy is necessary in this case (I cannot remove it entirely)

To avoid breaking things perhaps the environment file could be changed only in this branch...

fserva commented 3 years ago

Hello, I have implemented the changes in my local branch https://github.com/EC-Earth/ece2cmor3/commit/1652c48b3b38fd95d650d3af0bea9ed77dbf4872.

The external scripts work correctly after adding the PATH and PYTHONPATH in the platform. I need now to understand how it can be called through ece2cmor3, and how to pass arguments (exp name, directories, variable names, CMOR tables and so on)

fserva commented 3 years ago

Another update: the variables in the model output files appear to be correctly cmorized.

For those to be produced offline, I am having errors like ERROR:ece2cmor3.taskloader: Variable psitem in table EdayZ is not supported by ifs in ece2cmor3; if you do expect an ec-earth output variable here, please create an issue or pull request on our github page.

I guess I should edit the list-of-identified-missing-cmpi6-requested-variables.xlsx or the pextra version to have it working? (b.t.w., it is ok that it reports cmpi6 instaed of cmip6?

fserva commented 3 years ago

I notice that the variables are listed in list-of-ignored-cmpi6-requested-variables.xlsx so maybe these are filtered; on the other hand, utendnogw is in the same excel but it is cmorized correctly.

goord commented 3 years ago

Hi @fserva, yes your variables should be removed from the ignored/identified missing excel files, these are checked by ece2cmor.

Merry Christmas

fserva commented 3 years ago

Thanks @goord I will check this. I was confused by the fact that some variables were processed despite being in the xlsx.

Happy holidays to you!

fserva commented 3 years ago

While testing (after doing a setup.py install) I am having these kind of errors:

goord commented 3 years ago

Hi @fserva, best wishes to you. I pushed a fix of the typo causing the first error message. I also improved the error handling, so I hope the message from your second error is more useful

fserva commented 3 years ago

Hello @goord happy new year to you! I have commented your commit, it seems the problem is with the external script not getting the required parameters. I also think the wrong (GG) file is passed instead of the SH one.

fserva commented 3 years ago

Hello @goord and @treerink, an update on the integration of the external script. Now the execution works well and a set of files are produced. But there is a problem related I think with the timing of the execution.

The production of derived variables requires ~50min/month, while the cmorization of basic variables is faster (total ~3-4 hours). It seems that ece2cmor did not wait for the external script to end, so I only have the first 6 months processed and the cmorization failed for these variables (the output folder is not removed).

Is there a way to have ece2cmor waiting for the external process? Thanks

PS this is how the log looks like, the cdo commands are halted and execution stops

cdo merge: Processed 1462763520 values from 3 variables over 360 timesteps [28.40s 125MB]
#[18:26:39] make_dynvars.sh[118]> cdo selvar,var130 -ml2plx,101205,100915,100512,99984,99330,98563,97667,96617,9
5409,94044,92522,90844,89013,87038,84925,82681,80316,77840,75267,72607,69870,67073,64229,61350,58449,55540,52646
,49796,47017,44334,41773,39344,37042,34862,32799,30848,29002,27256,25607,24048,22577,21187,19876,18638,17472,163
72,15335,14359,13440,12575,11759,10989,10258,9562,8896,8257,7643,7051,6480,5931,5407,4907,4433,3985,3563,3167,27
99,2457,2141,1852,1588,1350,1137,947,780,634,509,402,312,238,178,130,92,64,43,28,17,10,6,3,1 ./t4cv_t_ml_185006.
nc_1 ./t4cv_t_pl_185006.nc
cdo(2) ml2plx: Process started

2021-01-06 18:27:06 ERROR:ece2cmor3.ifs2cmor: Output file /scratch/ms/it/cc1f/tmp_cmor/t4cv_1850_10290/ifs_1850/
CMIP6/t4cv-ifs-1850/make_dynvars-work/utendvtem_EdayZ.nc of script /perm/ms/it/cc1f/ecearth3/post/tem-diag/make_
dynvars.sh could not be found... skipping cmorization of task
2021-01-06 18:27:06 ERROR:ece2cmor3.ifs2cmor: Output file /scratch/ms/it/cc1f/tmp_cmor/t4cv_1850_10290/ifs_1850/
CMIP6/t4cv-ifs-1850/make_dynvars-work/utendwtem_EdayZ.nc of script /perm/ms/it/cc1f/ecearth3/post/tem-diag/make_
dynvars.sh could not be found... skipping cmorization of task
...other variables skipped...
2021-01-06 18:27:06 WARNING:ece2cmor3.ifs2cmor: Skipped removal of nonempty work directory /scratch/ms/it/cc1f/t
mp_cmor/t4cv_1850_10290/ifs_1850/CMIP6/t4cv-ifs-1850

! ------
! All files were closed successfully. 
! ------
goord commented 3 years ago

Hi @fserva you mean the working directory is removed before your script has finished? I will have a look at it in the evening. In the meantime, you can disable the cleanup by defining the environment variable export ECE2CMOR3_IFS_CLEANUP=FALSE before you launch ece2cmor.

fserva commented 3 years ago

Hi @goord no the working directory is left untouched since non-empty (I've pasted the log now), but ece2cmor finishes the execution without waiting for the slow processing still ongoing. I don't think the lack of SH files is the problem since the data folder and links are still there.

I guess ece2cmor stops the child process when other variables are done

goord commented 3 years ago

Hi @fserva I had a quick look at the code and I think somehow your script has failed after a few months of processing. After your script is finished, ece2cmor3 will proceed to cmorize the nc files you have created, but it can't find those. However, ec2cmor3 will always wait for your script to return because it invokes it via subprocess.check_output (see line 306 of ifs2cmor.py), which is a synchronous function call.

Do you see the file /scratch/ms/it/cc1f/tmp_cmor/t4cv_1850_10290/ifs_1850/CMIP6/t4cv-ifs-1850/make_dynvars-work/utendvtem_EdayZ.nc?

fserva commented 3 years ago

Maybe that shell=False can cause this behaviour (just guessing here)? https://stackoverflow.com/questions/36169571/python-subprocess-check-call-vs-check-output But with shell=True the arguments are not passed.

Also this seems interesting: https://stackoverflow.com/questions/43990219/how-does-subprocess-call-work-with-shell-false

fserva commented 3 years ago

No the /scratch/ms/it/cc1f/tmp_cmor/t4cv_1850_10290/ifs_1850/CMIP6/t4cv-ifs-1850/make_dynvars-work/utendvtem_EdayZ.nc is not produced since that would be generated by concatenating the outputs of month 1 to 12 by a cdo mergetime at the very end of make_dynvars.sh. The problem is that only 5 months and a bit are processed so these files named variable_table.nc are yet to appear.

goord commented 3 years ago

Does your script launch child processes (via the & in bash or subprocess/multiprocess in python)?

Have you tried executing the script outside of ece2cmor, by just passing the arguments yourself?

fserva commented 3 years ago

No, no child processes I think. Only an executable python script which I added to PYTHONPATH as suggested. Yes I have tested days ago and I did not make big changes since then. I will test again just to be sure...

goord commented 3 years ago

Because the process seems to end in the 6th month (not the first), I don't think there is any fundamental flaw with the subprocess call. It could e.g. be that cdo crashes with a zero status code (this shouldn't happen, but you never know with cdo), making ece2cmor3 think that the script execution was ok.

I can also recommend putting debug echo statements between all cdo calls.

fserva commented 3 years ago

Ok I will make another attempt to see if cdo is silently exiting. I've set the verbose mode for the script but there is no information there. It will take some hours, thanks for now

goord commented 3 years ago

From your output above I suspect something goes wrong in the ml2plx call for variable 130. This could be also unsufficient memory or disk space. Are you submitting to the parallel queue with a full node at your disposal?

fserva commented 3 years ago

Running on the fractional one (nf) and now I also take care of removing the big grib/netCDF files which may take too much space. However my $SCRATCH should have enough space. Indeed it may also be the system to kill the process for some reason

goord commented 3 years ago

Also you could comment out all the removals of intermediate files in the script, to get a better idea where things go bad.

Another idea is to remove the 'regular' variables from the varlist, then ece2cmor3 will run only your script variables. This may shine a light on whether there is interference from other tasks.

fserva commented 3 years ago

Thanks for the tips. Good idea about removing variables! I'll try editing the ifspar keeping only a small selection of standard variables. This should clarify what is going wrong with the external process indeed

goord commented 3 years ago

Hi @fserva no need to edit ifspar.json! Just look for the --varlist or --drq option in the ece2cmor call and edit that file, only keeping your tendencies in the data request

fserva commented 3 years ago

Some preliminary results: I have removed most variables from the dreq json file and the execution seems to be going on.

The scripts is running, and all the other variables should have been cmorized (the files are in the working directory named like variable_table.nc; I imagine all the files should be moved from their current to the cmorized folder at the end of the process?

So +1 for the occasional cdo error so far...

Update: currently past month 6, processing month 7. The random cdo error is getting more credit, so I may need to add some trap in the bash script!

goord commented 3 years ago

Yes once all your pre-processed nc-files have been successfully produced, ece2cmor3 will process them and create the cmorized files in the final output directory. You don't have to move them anywhere in your script.

fserva commented 3 years ago

Nope, another failure, apparently similar to the previous one:

#[18:14:55] make_dynvars.sh[59]> fvar=./t4cv_u_ml_185007.nc
#[18:14:55] make_dynvars.sh[61]> cdo -R -f nc -sp2gpl ./t4cv_u_ml_185007.grb ./t4cv_u_ml_185007.nc

2021-01-07 18:17:01 ERROR:ece2cmor3.ifs2cmor: Output file /scratch/ms/it/cc1f/tmp_cmor/t4cv_1850_11723/ifs_1850/
CMIP6/t4cv-ifs-1850/make_dynvars-work/utendvtem_EdayZ.nc of script /perm/ms/it/cc1f/ecearth3/post/tem-diag/make_
dynvars.sh could not be found... skipping cmorization of task

But I may have figured out the problem. Also now runtime is about 4 hours. I think somewhere a limit of 4 hours is set, but evidently it is too short for this. I'll try increasing the time limit to 12 hours... Perhaps this also explain why the scheduler quits without much explanation.

fserva commented 3 years ago

Hello @goord this time the process went further but I believe there is a problem with the cmorization due to the fact that variables are zonal mean.

As you can see the script ends ok...

#[02:28:59] make_dynvars.sh[218]> exit 0

2021-01-08 02:28:59 INFO:ece2cmor3.ifs2cmor: Cmorizing source variable 221 to target variable utendvtem...
2021-01-08 02:29:02 ERROR:ece2cmor3.ifs2cmor: Could not retrieve x-coordinate values from {'gridtype': 'gaussian', 'gridsize': 256, 'ysize': 256, 'yunits': '"degrees_north"', 'yname': 'lat', 'np': 128, 'ylongnam
e': '"latitude"', 'yvals': array([ 89.46282157,  88.76695135,  88.06697165,  87.36606343,

[...]

2021-01-08 02:29:02 ERROR:ece2cmor3.ifs2cmor: Invalid grid detected in post-processed data: {'gridtype': 'gaussian', 'gridsize': 256, 'ysize': 256, 'yunits': '"degrees_north"', 'yname': 'lat', 'np': 128, 'ylongn
ame': '"latitude"', 'yvals': array([ 89.46282157,  88.76695135,  88.06697165,  87.36606343,
[...]
File "/perm/ms/it/cc1f/miniconda2/envs/ece2cmor3/lib/python2.7/site-packages/ece2cmor3-1.6.1-py2.7.egg/ece2cmor3/ifs2cmor.py", line 538, in define_cmor_axes
    grid_id = grid_ids[0]
TypeError: 'NoneType' object has no attribute '__getitem__'

Should I write somewhere that the grid has no x-values, being zonal mean?

goord commented 3 years ago

Hi @fserva, I will take over from here, if you can send me some of your files I can adapt the code to correctly cmorize it. Do you have an ftp server or google drive so I can get at least 1 of the nc files produced by your script?