eguil / Density_bining

Density bining code
2 stars 5 forks source link

Final tweaks to binDensity.py #19

Closed durack1 closed 7 years ago

durack1 commented 9 years ago

So there are a few things to cleanup the code and get it running across all the historical simulations:

Completed:

durack1 commented 9 years ago

This mask issue is related to this difference between pre-function and post-function code: screen shot 2014-09-22 at 09 39 28

eguil commented 9 years ago

actually it comes from the first time step, i.e. K=1 in

plot/over/thick=1 ptopsigmap[K=1:20@ave] plot/over/thick=1 ptopsigmap[K=131:150@ave]

As you can see, the red line is ok.

On 22/9/14 19:56, Paul J. Durack wrote:

This mask issue is related to this difference between pre-function and post-function code: screen shot 2014-09-22 at 09 39 28 https://cloud.githubusercontent.com/assets/3229632/4361144/d27697fc-4281-11e4-8207-34da0ba22ed7.jpeg

— Reply to this email directly or view it on GitHub https://github.com/eguil/Density_bining/issues/19#issuecomment-56412791.

Eric Guilyardi IPSL/LOCEAN - Dir. Rech. CNRS Tour 45, 4eme, piece 406 UPMC, case 100 4 place Jussieu, F-75252 Paris Tel: +33 (0)1 44 27 70 76 Prof. Eric Guilyardi NCAS Climate Meteorology Department University of Reading Reading RG6 6BB - UK Tel: +44 (0)118 378 8315

             http://ncas-climate.nerc.ac.uk/~ericg
eguil commented 9 years ago

ok I had a first go at points 4 and 6 below. You can start from this version (on my github and currently in test for IPSL) for the other points.

On 22/9/14 19:30, Paul J. Durack wrote:

So there are a few things to cleanup the code and get it running across all the historical simulations:

  1. Fix mask issue
  2. Write 4D dimension first (so all coords are written in top of file header)
  3. Rename variables to be more descriptive (a->Atl, p->Pac etc)
  4. Add more comments throughout code to explain subsections and what they're doing
  5. Add timeint argument so that debugging can run through a single timestep quickly (rather than ingesting large time chunks)
  6. Chase down memory usage - can well placed |del(var) ; gc.collect()| statements reduce the memory footprint?

— Reply to this email directly or view it on GitHub https://github.com/eguil/Density_bining/issues/19.

Eric Guilyardi IPSL/LOCEAN - Dir. Rech. CNRS Tour 45, 4eme, piece 406 UPMC, case 100 4 place Jussieu, F-75252 Paris Tel: +33 (0)1 44 27 70 76 Prof. Eric Guilyardi NCAS Climate Meteorology Department University of Reading Reading RG6 6BB - UK Tel: +44 (0)118 378 8315

             http://ncas-climate.nerc.ac.uk/~ericg
durack1 commented 9 years ago

Ok great, I've just pulled this across into my repo and will start where you finished up..

durack1 commented 9 years ago

I'm guessing you've had trouble getting this to run as you redeclare the so/thetao variables as npy arrays, thereby breaking their cdms2 transient variable status and then leading to calls like x1Bin.long_name = thetao.long_name failing.. I'll try to put this all back in..

durack1 commented 9 years ago

Is there a reason why arrays are preallocated with 'NaN' values rather than valmask? https://github.com/eguil/Density_bining/blob/master/binDensity.py#L534-537 https://github.com/eguil/Density_bining/blob/master/binDensity.py#L760

We should really avoid using NaNs within code, cdms2 doesn't play nicely with them.

durack1 commented 9 years ago

Ok so my latest https://github.com/durack1/Density_bining/commit/8e6d58188b4380299a83c91332949ee13949b8ba should (I've just kicked off a run for testing) solve problems 2 and 5.

I'll still need your help with the mask issue (1) and will also have to get back to renaming the netcdf file variables (3), which will also probably lead me to addressing (4) as I learn what is going on. I've also replaced the newAttribute = so.longname with a new variable handle (not loaded, just referenced, so we can then pick off variable attributes, shapes etc, without loading the variable into memory).

https://github.com/durack1/Density_bining/blob/master/binDensity.py#L321-324 https://github.com/durack1/Density_bining/blob/master/binDensity.py#L959-992 https://github.com/durack1/Density_bining/blob/master/binDensity.py#L1055-1058

durack1 commented 9 years ago

I was just thinking.. This mask issue could be due to different behaviours with NaNs from numpy 1.9.0 (mine) to 1.7.1 (yours).. And is a motivator to address some of the NaN declarations noted above

durack1 commented 9 years ago

Ok so seems with https://github.com/durack1/Density_bining/commit/9d654dc4897bc9d0f643df66427b37708eefc225 I'm now getting this error, which only occurs into the second timestep of the analysis:

Traceback (most recent call last):
  File ".//drive_IPSL.py", line 29, in <module>
    densityBin(modelThetao,modelSo,modelAreacello,outfileDensity)
  File "/export/durack1/git/Density_bining/binDensity.py", line 662, in densityBin
    depthBini[t,ks,:,:]         = regridObj(dy [t,ks,:,:])
UnboundLocalError: local variable 'depthBini' referenced before assignment

I've got a feeling that purging this is causing the issue: https://github.com/durack1/Density_bining/blob/master/binDensity.py#L749

What else shouldn't be purged (or what code can we comment out)?

eguil commented 9 years ago

yep, too violent a purge... On 24/9/14 06:34, Paul J. Durack wrote:

Ok so seems with durack1@9d654dc https://github.com/durack1/Density_bining/commit/9d654dc4897bc9d0f643df66427b37708eefc225 I'm now getting this error, which only occurs into the second timestep of the analysis:

Traceback (most recent call last): File ".//drive_IPSL.py", line 29, in densityBin(modelThetao,modelSo,modelAreacello,outfileDensity) File "/export/durack1/git/Density_bining/binDensity.py", line 662, in densityBin depthBini[t,ks,:,:] = regridObj(dy [t,ks,:,:]) UnboundLocalError: local variable 'depthBini' referenced before assignment

I've got a feeling that purging this is causing the issue: https://github.com/durack1/Density_bining/blob/master/binDensity.py#L749

What else shouldn't be purged (or what code can we comment out)?

— Reply to this email directly or view it on GitHub https://github.com/eguil/Density_bining/issues/19#issuecomment-56623820.

Eric Guilyardi IPSL/LOCEAN - Dir. Rech. CNRS Tour 45, 4eme, piece 406 UPMC, case 100 4 place Jussieu, F-75252 Paris Tel: +33 (0)1 44 27 70 76 Prof. Eric Guilyardi NCAS Climate Meteorology Department University of Reading Reading RG6 6BB - UK Tel: +44 (0)118 378 8315

             http://ncas-climate.nerc.ac.uk/~ericg
eguil commented 9 years ago

well, these arrays are used in a numpy interpolation below (npy.interp) and we only want the interpolation to happen on the part of the profile that is defined (i.e. not NaN, see loop below). If we can find a way to tell numpy to ignore the valmask points then we can init the arrays with valmask.

On 24/9/14 03:09, Paul J. Durack wrote:

Is there a reason why arrays are preallocated with 'NaN' values rather than valmask? https://github.com/eguil/Density_bining/blob/master/binDensity.py#L534-537

— Reply to this email directly or view it on GitHub https://github.com/eguil/Density_bining/issues/19#issuecomment-56612727.

Eric Guilyardi IPSL/LOCEAN - Dir. Rech. CNRS Tour 45, 4eme, piece 406 UPMC, case 100 4 place Jussieu, F-75252 Paris Tel: +33 (0)1 44 27 70 76 Prof. Eric Guilyardi NCAS Climate Meteorology Department University of Reading Reading RG6 6BB - UK Tel: +44 (0)118 378 8315

             http://ncas-climate.nerc.ac.uk/~ericg
eguil commented 9 years ago

indeed ! because I cannot use the test mode, it takes a while to see obvious bugs...

On 24/9/14 01:43, Paul J. Durack wrote:

I'm guessing you've had trouble getting this to run as you redeclare the so/thetao variables as npy arrays, thereby breaking their cdms2 transient variable status and then leading to calls like |x1Bin.long_name = thetao.long_name| failing.. I'll try to put this all back in..

— Reply to this email directly or view it on GitHub https://github.com/eguil/Density_bining/issues/19#issuecomment-56606919.

Eric Guilyardi IPSL/LOCEAN - Dir. Rech. CNRS Tour 45, 4eme, piece 406 UPMC, case 100 4 place Jussieu, F-75252 Paris Tel: +33 (0)1 44 27 70 76 Prof. Eric Guilyardi NCAS Climate Meteorology Department University of Reading Reading RG6 6BB - UK Tel: +44 (0)118 378 8315

             http://ncas-climate.nerc.ac.uk/~ericg
eguil commented 9 years ago

hum... why would this happen only for the first year ? On 24/9/14 06:03, Paul J. Durack wrote:

I was just thinking.. This mask issue could be due to different behaviours with NaNs from numpy 1.9.0 (mine) to 1.7.1 yours.. And is a motivator to address some of the NaN declarations noted above

— Reply to this email directly or view it on GitHub https://github.com/eguil/Density_bining/issues/19#issuecomment-56622420.

Eric Guilyardi IPSL/LOCEAN - Dir. Rech. CNRS Tour 45, 4eme, piece 406 UPMC, case 100 4 place Jussieu, F-75252 Paris Tel: +33 (0)1 44 27 70 76 Prof. Eric Guilyardi NCAS Climate Meteorology Department University of Reading Reading RG6 6BB - UK Tel: +44 (0)118 378 8315

             http://ncas-climate.nerc.ac.uk/~ericg
eguil commented 9 years ago

ok, my last commit (https://github.com/eguil/Density_bining/commit/76dfac11f9fb3b1384735e31680913b956dfbeaf) works.

I changed drive_IPSL.py on crunchy to have the timeint='1,12' flag and that speeds up testing

eguil commented 9 years ago

IPSL run using last commit: [This is still 8 GB more than before changes to independent proc.] [ ratio of 15 is not great - can reach 6]

 thetao: units corrected
 CPU of density bining      = 267.8
 CPU of annual mean compute = 171.79
 CPU of interpolation       = 112.2
 CPU of zonal mean          = 171.66
 CPU of persistence compute = 930.28
 CPU of chunk               = 1654.09
 [ Time stamp 24/09/2014 16:17:43 ]
 Max memory use 25.881148 GB
 Ratio to grid*nyears 0.197351116473 kB/unit(size*nyears)
 CPU use, elapsed 24254.94 24350.408776
 Ratio to grid*nyears 15.4125681525 1.e-6 sec/unit(size*nyears)
 Wrote file:  test/cmip5.IPSL-CM5A-LR.historical.r1i1p1.an.ocn.Omon.density.ver-v20111119-compressed.nc
eguil commented 9 years ago

just tested the compressed file done on crunchy and the mask issue for year 1 is gone. So it must indeed be due to the new bumpy. Let me know if replacing nan with valmask would work with npy.interp (but I still don't understand why it would have this specific behaviour...)

durack1 commented 9 years ago

@eguil I have a feeling that it's the NaNs that are causing this issue.. Although it highlights a numpy1.9.0 vs numpy1.7.1 behaviour change that we need to get to the bottom of.

eguil commented 9 years ago

ok - I have now a version without NaNs that looks ok (commit https://github.com/eguil/Density_bining/commit/ff5781d4eeedfdfe1db5191a4affa71159443ef2). Can you please try it with your version of numpy ?

eguil commented 9 years ago

Actually, here is the latest version to use: 81d17f8da39df0bc0bb141c4af55e99b4466de43

eguil commented 9 years ago

there was a bug in the thickness calculation. Now correct with commit 189a5cea46a83549feceecbef4e4e83b816d9360

durack1 commented 9 years ago

Ok so you seem to have things working with https://github.com/eguil/Density_bining/commit/b655466a5f487bab0f082a0fd0253c8889052069 - should I pull this across and run again?

durack1 commented 9 years ago

@eguil so I'm now starting to test across the suite, and have hit a problem with non-standard (lat/lon) grids:

Debug - File names:
thetao: /work/cmip5/historical/ocn/mo/thetao/cmip5.ACCESS1-0.historical.r1i1p1.mo.ocn.Omon.thetao.ver-1.latestX.xml
so    : /work/cmip5/historical/ocn/mo/so/cmip5.ACCESS1-0.historical.r1i1p1.mo.ocn.Omon.so.ver-1.latestX.xml
...
Traceback (most recent call last):
  File ".//drive_density.py", line 195, in <module>
    densityBin(model[3],model[1],model[5],outfileDensity)
  File "/export/durack1/git/Density_bining/binDensity.py", line 671, in densityBin
    dy   = cdm.createVariable(dy  , axes = [timeyr, s_axis, ingrid], id = 'isondy')
  File "/usr/local/uvcdat/2014-09-22/lib/python2.7/site-packages/cdms2/tvariable.py", line 826, in createVariable
    return TransientVariable(*args,**kargs)
  File "/usr/local/uvcdat/2014-09-22/lib/python2.7/site-packages/cdms2/tvariable.py", line 172, in __init__
    grid = grid.reconcile(axes) # Make sure grid and axes are consistent
  File "/usr/local/uvcdat/2014-09-22/lib/python2.7/site-packages/cdms2/hgrid.py", line 681, in reconcile
    if (item not in used) and len(selfaxes[i])==len(item) and allclose(selfaxes[i], item):
TypeError: object of type 'DatasetCurveGrid' has no len()

https://github.com/durack1/Density_bining/blob/master/binDensity.py#L671 I might have to engage with @doutriaux1 to determine a way to work around this issue..

doutriaux1 commented 9 years ago

@durack1 if ingrid is a grid even regular lat/lon it wouldn't work

I can't remember how to make a curv grid out of thin air, I'll look, but in the mean time if you're going to save to do:

axesList = original_var.getAxisList()

replace the depth axes

axesList[1] = newDepthAxes

and then

dy   = cdm.createVariable(dy  , axes = axesList, id = 'isondy')

then on dy don't forget to add the coordinates attribute on dy and save the "bound variables" into the file.

Then reading in the file back in you should be good to go, I will look for curvlinear calls as well and will post here when I find them.

eguil commented 9 years ago

Hi Paul,

your message on grids is a bit cryptic ! Not sure what you want me to do and why it would not work with regular lat/lon grids. We can do a quick skype today (9 am for you?) if you want (send me a text message if possible/when ready).

I was also thinking of writing out the annual mean 3D lat/lon/rhon on the WOA grid as just looking at zonal means can be quickly restrictive.

On 2/10/14 02:20, Charles wrote:

@durack1 https://github.com/durack1 if ingrid is a grid even regular lat/lon it wouldn't work

I can't remember how to make a curv grid out of thin air, I'll look, but in the mean time if you're going to save to file: if ewoulddo:

axesList = original_var.getAxisList()

replace the depth axes

axesList[1] = newDepthAxes

and then

dy = cdm.createVariable(dy , axes = axesList, id = 'isondy')

then on dy don't forget to add the coordinates attribute on dy ad to save the "bound variables" into the file.

Then reading in the file back in you should be good to go, I will look for curvlinear calls as well and will post here when I find them.

— Reply to this email directly or view it on GitHub https://github.com/eguil/Density_bining/issues/19#issuecomment-57561818.

Eric Guilyardi IPSL/LOCEAN - Dir. Rech. CNRS Tour 45, 4eme, piece 406 UPMC, case 100 4 place Jussieu, F-75252 Paris Tel: +33 (0)1 44 27 70 76 Prof. Eric Guilyardi NCAS Climate Meteorology Department University of Reading Reading RG6 6BB - UK Tel: +44 (0)118 378 8315

             http://ncas-climate.nerc.ac.uk/~ericg
durack1 commented 9 years ago

@eguil the issue here is that the cdm.createVariable syntax is not standard, and when you pass this a curvilinear grid it falls over.. Which is half the CMIP5 suite.. I did hit this issue earlier, but wanted to validate results with the IPSL-CM5A-LR model first before attempting to run across the archive..

I have some work to do to get things working across all the models, ACCESS1-0 was the first in the list to process, and this model is what led to the error above https://github.com/eguil/Density_bining/issues/19#issuecomment-57555179

Skype at 9am sounds fine, I'll try from the office when I get in..

durack1 commented 9 years ago

This is what I see for current files:

[durack1@oceanonly Density_bining]$ ncdump -h test/cmip5.IPSL-CM5A-LR.historical.r1i1p1.an.ocn.Omon.density.ver-v20111119-compressed.nc | grep '(time,'
        double bounds_time(time, bound) ;
        double persim(time, latitude, longitude) ; [ 40 MB per such field our of 220 MB for the file]
        double ptopdepth2(time, latitude, longitude) ;
        double ptoptemp2(time, latitude, longitude) ;
        double ptopsalt2(time, latitude, longitude) ;
        double isonpers(time, rhon, latitude) ;
        double isonpersa(time, rhon, latitude) ;
        double isonpersp(time, rhon, latitude) ;
        double isonpersi(time, rhon, latitude) ;
        double ptopdepth(time, latitude) ;
        double ptopsigma(time, latitude) ;
        double ptoptemp(time, latitude) ;
        double ptopsalt(time, latitude) ;
        double ptopdeptha(time, latitude) ;
        double ptopsigmaa(time, latitude) ;
        double ptoptempa(time, latitude) ;
        double ptopsalta(time, latitude) ;
        double ptopdepthp(time, latitude) ;
        double ptopsigmap(time, latitude) ;
        double ptoptempp(time, latitude) ;
        double ptopsaltp(time, latitude) ;
        double ptopdepthi(time, latitude) ;
        double ptopsigmai(time, latitude) ;
        double ptoptempi(time, latitude) ;
        double ptopsalti(time, latitude) ;
        double isondepth(time, rhon, latitude) ;
        double isonthick(time, rhon, latitude) ;
        double isonvol(time, rhon, latitude) ;
        double thetao(time, rhon, latitude) ;
        double so(time, rhon, latitude) ;
        double isondeptha(time, rhon, latitude) ;
        double isonthicka(time, rhon, latitude) ;
        double isonvola(time, rhon, latitude) ;
        double thetaoa(time, rhon, latitude) ;
        double soa(time, rhon, latitude) ;
        double isondepthp(time, rhon, latitude) ;
        double isonthickp(time, rhon, latitude) ;
        double isonvolp(time, rhon, latitude) ;
        double thetaop(time, rhon, latitude) ;
        double sop(time, rhon, latitude) ;
        double isondepthi(time, rhon, latitude) ;
        double isonthicki(time, rhon, latitude) ;
        double isonvoli(time, rhon, latitude) ;
        double thetaoi(time, rhon, latitude) ;
        double soi(time, rhon, latitude) ;
durack1 commented 9 years ago

@eguil I got sidetracked today and made little progress on this.. I'll get back to it tomorrow..

eguil commented 9 years ago

@durack1 - ok let me know when you made progress. We need to start to do some science !

On 3/10/14 07:23, Paul J. Durack wrote:

@eguil https://github.com/eguil I got sidetracked today and made little progress on this.. I'll get back to it tomorrow..

— Reply to this email directly or view it on GitHub https://github.com/eguil/Density_bining/issues/19#issuecomment-57754248.

Eric Guilyardi IPSL/LOCEAN - Dir. Rech. CNRS Tour 45, 4eme, piece 406 UPMC, case 100 4 place Jussieu, F-75252 Paris Tel: +33 (0)1 44 27 70 76 Prof. Eric Guilyardi NCAS Climate Meteorology Department University of Reading Reading RG6 6BB - UK Tel: +44 (0)118 378 8315

             http://ncas-climate.nerc.ac.uk/~ericg
durack1 commented 9 years ago

@eguil just a heads up, I've got this now running across the whole archive (timeint='1,24') to make sure that I've dealt with all the weird grid cases and will hopefully have these tests completed tomoz.. Once that's done I can then finalise all the outlined issues up top of this post and we've got our prototype ready to roll..

eguil commented 9 years ago

@durack1 - ok great. I've made some changes to my version of binDensity (integral properties and a bug correction). I will have finished the testing Monday to do the merge. Eric On 17/10/14 03:46, Paul J. Durack wrote:

@eguil https://github.com/eguil just a heads up, I've got this now running across the whole archive (timeint='1,24') to make sure that I've dealt with all the weird grid cases and will hopefully have these tests completed tomoz.. Once that's done I can then finalise all the outlined issues up top of this post and we've got our prototype ready to roll..

— Reply to this email directly or view it on GitHub https://github.com/eguil/Density_bining/issues/19#issuecomment-59456055.

Eric Guilyardi IPSL/LOCEAN - Dir. Rech. CNRS Tour 45, 4eme, piece 406 UPMC, case 100 4 place Jussieu, F-75252 Paris Tel: +33 (0)1 44 27 70 76 Prof. Eric Guilyardi NCAS Climate Meteorology Department University of Reading Reading RG6 6BB - UK Tel: +44 (0)118 378 8315

             http://ncas-climate.nerc.ac.uk/~ericg
durack1 commented 9 years ago

@eguil I'm still trying to get this working across the whole archive, some of the model fields are missing certain variables/coords that the code has been using.. I think I'm going to reorganise to do all the reading/testing input fields in one place, so that the issue I'm hitting is easier to find and fix.. Will keep you posted..

durack1 commented 9 years ago

@eguil Ok take 5.. Got it running again to test across the archive, I'll see tomorrow if it managed to get through the lot (all models) or whether I have more model specific tweaks to code around.. I've also got a bunch of additional questions up at https://github.com/eguil/Density_bining/issues/19#issue-43521185 - in particular, are you happy to me to collapse the basin zonal mean results into a single variable [basins, time, rho, lat] so then we reduce the number of total file variables by quite a bit.. And what was the number 6 item "modify axis definition.." I don't remember what this one is..

durack1 commented 9 years ago

@eguil nope bombed again.. Will keep trying..

eguil commented 9 years ago

@durack1, good idea to collapse the basin zonal means as one variable - see other replies & comments in the #19 (comment) https://github.com/eguil/Density_bining/issues/19#issue-43521185 - I will mostly have next week to work on the analysis for the presentation as I am travelling afterwards. I will start with the 3/4 models I ran. On 20/10/14 05:58, Paul J. Durack wrote:

@eguil https://github.com/eguil Ok take 5.. Got it running again to test across the archive, I'll see tomorrow if it managed to get through the lot (all models) or whether I have more model specific tweaks to code around.. I've also got a bunch of additional questions up at #19 (comment) https://github.com/eguil/Density_bining/issues/19#issue-43521185 - in particular, are you happy to me to collapse the basin zonal mean results into a single variable [basins, time, rho, lat] so then we reduce the number of total file variables by quite a bit.. And what was the number 6 item "modify axis definition.." I don't remember what this one is..

— Reply to this email directly or view it on GitHub https://github.com/eguil/Density_bining/issues/19#issuecomment-59681234.

Eric Guilyardi IPSL/LOCEAN - Dir. Rech. CNRS Tour 45, 4eme, piece 406 UPMC, case 100 4 place Jussieu, F-75252 Paris Tel: +33 (0)1 44 27 70 76 Prof. Eric Guilyardi NCAS Climate Meteorology Department University of Reading Reading RG6 6BB - UK Tel: +44 (0)118 378 8315

             http://ncas-climate.nerc.ac.uk/~ericg
durack1 commented 9 years ago

@eguil what's the z_s variable and is it needed? https://github.com/durack1/Density_bining/blob/master/binDensity.py#L520 I've been trying to figure out what the indexing is (it's a weird combo of boolean and -1's).. I've just commented it out and seems to be running across the models now.. Will see how far it gets, and also what the outputs look like tomoz.. Looks like it's got to the C*'s..

eguil commented 9 years ago

@durack1 z_s is the depth of the bin. The line you commented sets the max depth of each column to the highest density (index N_s). It is most of the time rewritten by the interpolation line below : z_s [0:N_s,i] = npy.interp(s_s[:,i], szm[:,i], zzm[:,i], right = valmask) ; # consider spline
It is strange this is creating problems...

durack1 commented 9 years ago

@eguil there are some weird cases where the input data trip over your code, so think I've worked out most of these quirks.. I've managed to get it running across all models (alphabetically) up to MIROC4h.. And have just tweaked things to deal with that model, so assuming it runs across the remaining inputs we should have the code working.. I'm now rewriting the outputs to clean them up, and then will kick it off for the whole historical archive..

eguil commented 9 years ago

@durack1 ok let me know how it goes. Have you integrated my latest version of binDensity.py ?

durack1 commented 9 years ago

@eguil I've hit a problem with the code that is going to need your intervention to sort out - I'm still not completely following how you're getting the result. My repo https://github.com/durack1/Density_bining/commit/0a016255197f440b162659f6d7c336d188165f32 should have a working version that is returning all masked values - I'm not sure what I've tangled up..

If you use drive_density: [durack1@oceanonly Density_bining]$ drive_density.py cmip5 historical test it should work as advertised, and I've got the 5 models (which I had to fiddle code to get it working) only set to run for the 24 months, so get this test case working and I think we're off.. You may need to update your version of durolib - https://raw.githubusercontent.com/durack1/pylib/master/durolib.py - I've dropped in some more dob functions to let us know which version of code was used to write the outfiles..

I've also collapsed all the zonal means into a single variable (and done some renaming), however the data in the arrays is rubbish due to what I think is a mask propagation issue..?

If you can get this back working, it's good to go (I think!?!).

It could still be further optimized - converting all code to pure numpy and removing use of npy.ma when we can.. But I think a result is more important than optimization at this stage, so we can leave it as is.. The output files are not fully CF-compliant, but that again is another less important issue, as these files wont be redistributed..

Probably a good idea to skype about this early this week if you're able..

eguil commented 9 years ago

@durack1 ok I'll have a look tomorrow. We can skype tomorrow (Tuesday) at 5 or 6 pm for me 9 or 10am for you. On 27/10/14 05:08, Paul J. Durack wrote:

@eguil https://github.com/eguil I've hit a problem with the code that is going to need your intervention to sort out - I'm still not completely following how you're getting the result. My repo durack1@0a01625 https://github.com/durack1/Density_bining/commit/0a016255197f440b162659f6d7c336d188165f32 should have a working version that is returning all masked values - I'm not sure what I've tangled up..

If you use drive_density: |[durack1@oceanonly Density_bining]$ drive_density.py cmip5 historical test| it should work as advertised, and I've got the 5 models which I had to fiddle code to get it working.. You may need to update your version of durolib - https://raw.githubusercontent.com/durack1/pylib/master/durolib.py - I've dropped in some more dob functions to let us know which version of code was used to write the outfiles..

I've also collapsed all the zonal means into a single variable (and done some renaming), however the data in the arrays is rubbish due to what I think is a mask propagation issue..?

If you can get this back working, it's good to go (I think!?!).

It could still be further optimized - converting all code to pure numpy and removing use of npy.ma when we can.. But I think a result is more important than optimization at this stage, so we can leave it as is.. The output files are not fully CF-compliant, but that again is another less important issue, as these files wont be redistributed..

Probably a good idea to skype about this early this week if you're able..

— Reply to this email directly or view it on GitHub https://github.com/eguil/Density_bining/issues/19#issuecomment-60548526.

Eric Guilyardi IPSL/LOCEAN - Dir. Rech. CNRS Tour 45, 4eme, piece 406 UPMC, case 100 4 place Jussieu, F-75252 Paris Tel: +33 (0)1 44 27 70 76 Prof. Eric Guilyardi NCAS Climate Meteorology Department University of Reading Reading RG6 6BB - UK Tel: +44 (0)118 378 8315

             http://ncas-climate.nerc.ac.uk/~ericg
durack1 commented 9 years ago

@eguil sounds good.. We're close.. I also checked out the CF-compliance of the output files (even though the data is rubbish at the moment) and it's not as bad as I thought, so once this masking issue is resolved should be good to start running across the archive and doing some science!

9am tomorrow sounds great..

The new output format looks like:

[durack1@oceanonly Density_bining]$ ncdump -h test/cmip5.IPSL-CM5A-LR.historical.r5i1p1.an.ocn.Omon.density.ver-v20111119.nc 
netcdf cmip5.IPSL-CM5A-LR.historical.r5i1p1.an.ocn.Omon.density.ver-v20111119 {                                              
dimensions:                                                                                                                  
        time = UNLIMITED ; // (2 currently)                                                                                  
        bound = 2 ;                                                                                                          
        latitude = 180 ;                                                                                                     
        longitude = 360 ;                                                                                                    
        basin = 4 ;                                                                                                          
        lev = 61 ;                                                                                                           
variables:                                                                                                                   
        double time(time) ;                                                                                                  
                time:bounds = "bounds_time" ;                                                                                
                time:units = "days since 1850-01-01 00:00:00" ;                                                              
                time:calendar = "gregorian" ;                                                                                
                time:axis = "T" ;                                                                                            
        double bounds_time(time, bound) ;                                                                                    
        float latitude(latitude) ;                                                                                           
                latitude:bounds = "bounds_latitude" ;                                                                        
                latitude:units = "" ;                                                                                        
                latitude:axis = "Y" ;                                                                                        
        double bounds_latitude(latitude, bound) ;                                                                            
        float longitude(longitude) ;                                                                                         
                longitude:bounds = "bounds_longitude" ;                                                                      
                longitude:units = "" ;                                                                                       
                longitude:axis = "X" ;                                                                                       
                longitude:modulo = 360. ;                                                                                    
                longitude:topology = "circular" ;                                                                            
        double bounds_longitude(longitude, bound) ;                                                                          
        float persistmxy(time, latitude, longitude) ;                                                                        
                persistmxy:long_name = "Fraction of persistence on isopycnal bins" ;                                         
                persistmxy:units = "% of column" ;                                                                           
                persistmxy:missing_value = 1.e+20f ;                                                                         
        float ptopdepthxy(time, latitude, longitude) ;                                                                       
                ptopdepthxy:long_name = "Depth of shallowest persistent ocean on ison" ;                                     
                ptopdepthxy:units = "m" ;                                                                                    
                ptopdepthxy:missing_value = 1.e+20f ;                                                                        
        float ptopthetaoxy(time, latitude, longitude) ;                                                                      
                ptopthetaoxy:long_name = "Temp. of shallowest persistent ocean on ison" ;                                    
                ptopthetaoxy:units = "degrees_C" ;                                                                           
                ptopthetaoxy:missing_value = 1.e+20f ;                                                                       
        float ptopsoxy(time, latitude, longitude) ;                                                                          
                ptopsoxy:long_name = "Salinity of shallowest persistent ocean on ison" ;                                     
                ptopsoxy:units = "psu" ;                                                                                     
                ptopsoxy:missing_value = 1.e+20f ;                                                                           
        int basin(basin) ;                                                                                                   
                basin:bounds = "bounds_basin" ;                                                                              
                basin:units_long = "0: global_ocean 1: atlantic_ocean; 2: pacific_ocean; 3: indian_ocean" ;
                basin:long_name = "ocean basin index" ;
                basin:standard_name = "basin" ;
                basin:units = "basin index" ;
                basin:axis = "B" ;
        double bounds_basin(basin, bound) ;
        double lev(lev) ;
                lev:bounds = "bounds_lev" ;
                lev:units_long = "kg m-3 (anomaly, minus 1000)" ;
                lev:positive = "down" ;
                lev:long_name = "ocean neutral density coordinate" ;
                lev:standard_name = "lev" ;
                lev:units = "kg m-3" ;
                lev:axis = "Z" ;
        double bounds_lev(lev, bound) ;
        float isonpers(time, basin, lev, latitude) ;
                isonpers:long_name = "zonal persistence of isopycnal bins" ;
                isonpers:units = "% of time" ;
                isonpers:missing_value = 1.e+20f ;
        float ptopdepth(time, basin, latitude) ;
                ptopdepth:long_name = "Zonal depth of shallowest persistent ocean on ison" ;
                ptopdepth:units = "m" ;
                ptopdepth:missing_value = 1.e+20f ;
        float ptopsigma(time, basin, latitude) ;
                ptopsigma:long_name = "Zonal rhon of shallowest persistent ocean on ison" ;
                ptopsigma:units = "sigma_n" ;
                ptopsigma:missing_value = 1.e+20f ;
        float ptopthetao(time, basin, latitude) ;
                ptopthetao:long_name = "Zonal Temp. of shallowest persistent ocean on ison" ;
                ptopthetao:units = "degrees_C" ;
                ptopthetao:missing_value = 1.e+20f ;
        float ptopso(time, basin, latitude) ;
                ptopso:long_name = "Zonal Salinity of shallowest persistent ocean on ison" ;
                ptopso:units = "psu" ;
                ptopso:missing_value = 1.e+20f ;
        float isondepth(time, basin, lev, latitude) ;
                isondepth:long_name = "Global zonal depth of isopycnal" ;
                isondepth:units = "m" ;
                isondepth:missing_value = 1.e+20f ;
        float isonthick(time, basin, lev, latitude) ;
                isonthick:long_name = "Global zonal thickness of isopycnal" ;
                isonthick:units = "m" ;
                isonthick:missing_value = 1.e+20f ;
        float isonvol(time, basin, lev, latitude) ;
                isonvol:long_name = "Volume of isopycnal" ;
                isonvol:units = "10.e12 m^3" ;
                isonvol:missing_value = 1.e+20f ;
        float isonthetao(time, basin, lev, latitude) ;
                isonthetao:long_name = "Sea Water Potential Temperature" ;
                isonthetao:units = "degrees_C" ;
                isonthetao:missing_value = 1.e+20f ;
        float isonso(time, basin, lev, latitude) ;
                isonso:long_name = "Sea Water Salinity" ;
                isonso:units = "psu" ;
                isonso:missing_value = 1.e+20f ;

// global attributes:
                :Conventions = "CF-1.0" ;
                :institution = "Program for Climate Model Diagnosis and Intercomparison (LLNL)" ;
                :data_contact = "Paul J. Durack; pauldurack@llnl.gov; +1 925 422 5208" ;
                :history = "File processed: 26-10-2014 23:05:51 PM UTC; San Francisco, CA, USA" ;
                :host = "oceanonly.llnl.gov; UVCDAT version: 2.0.0-3-g38214e5; Python version: 2.7.7 (default, Oct 15 2014, 10:21:57); [GCC 4.4.7 20120313 (Red Hat 4.4.7-11)]" ;
                :binDensity_version = "commit c44e26d892c017f996c4233e06afb8bcb25691fe Author: Paul J. Durack <durack1@llnl.gov> Date: Thu Oct 23 18:42:58 2014 -0700" ;
}
eguil commented 9 years ago

@durack1 can you remind me how to pull your version accross to my repo for a merge ?

On 27/10/14 19:44, Paul J. Durack wrote:

@eguil https://github.com/eguil sounds good.. We're close.. I also checked out the CF-compliance of the output files (even though the data is rubbish at the moment) and it's not as bad as I thought, so once this masking issue is resolved should be good to start running across the archive and doing some science!

9am tomorrow sounds great..

The new output format looks like:

|[durack1@oceanonly Density_bining]$ ncdump -h test/cmip5.IPSL-CM5A-LR.historical.r5i1p1.an.ocn.Omon.density.ver-v20111119.nc netcdf cmip5.IPSL-CM5A-LR.historical.r5i1p1.an.ocn.Omon.density.ver-v20111119 { dimensions: time = UNLIMITED ; // (2 currently) bound = 2 ; latitude = 180 ; longitude = 360 ; basin = 4 ; lev = 61 ; variables: double time(time) ; time:bounds = "bounds_time" ; time:units = "days since 1850-01-01 00:00:00" ; time:calendar = "gregorian" ; time:axis = "T" ; double bounds_time(time, bound) ; float latitude(latitude) ; latitude:bounds = "bounds_latitude" ; latitude:units = "" ; latitude:axis = "Y" ; double bounds_latitude(latitude, bound) ; float longitude(longitude) ; longitude:bounds = "bounds_longitude" ; longitude:units = "" ; longitude:axis = "X" ; longitude:modulo = 360. ; longitude:topology = "circular" ; double bounds_longitude(longitude, bound) ; float persistmxy(time, latitude, longitude) ; persistmxy:long_name = "Fraction of persistence on isopycnal bins" ; persistmxy:units = "% of column" ; persistmxy:missing_value = 1.e+20f ; float ptopdepthxy(time, latitude, longitude) ; ptopdepthxy:long_name = "Depth of shallowest persistent ocean on ison" ; ptopdepthxy:units = "m" ; ptopdepthxy:missing_value = 1.e+20f ; float ptopthetaoxy(time, latitude, longitude) ; ptopthetaoxy:long_name = "Temp. of shallowest persistent ocean on ison" ; ptopthetaoxy:units = "degrees_C" ; ptopthetaoxy:missing_value = 1.e+20f ; float ptopsoxy(time, latitude, longitude) ; ptopsoxy:long_name = "Salinity of shallowest persistent ocean on ison" ; ptopsoxy:units = "psu" ; ptopsoxy:missing_value = 1.e+20f ; int basin(basin) ; basin:bounds = "bounds_basin" ; basin:units_long = "0: global_ocean 1: atlantic_ocean; 2: pacific_ocean; 3: indian_ocean" ; basin:long_name = "ocean basin index" ; basin:standard_name = "basin" ; basin:units = "basin index" ; basin:axis = "B" ; double bounds_basin(basin, bound) ; double lev(lev) ; lev:bounds = "bounds_lev" ; lev:units_long = "kg m-3 (anomaly, minus 1000)" ; lev:positive = "down" ; lev:long_name = "ocean neutral density coordinate" ; lev:standard_name = "lev" ; lev:units = "kg m-3" ; lev:axis = "Z" ; double bounds_lev(lev, bound) ; float isonpers(time, basin, lev, latitude) ; isonpers:long_name = "zonal persistence of isopycnal bins" ; isonpers:units = "% of time" ; isonpers:missing_value = 1.e+20f ; float ptopdepth(time, basin, latitude) ; ptopdepth:long_name = "Zonal depth of shallowest persistent ocean on ison" ; ptopdepth:units = "m" ; ptopdepth:missing_value = 1.e+20f ; float ptopsigma(time, basin, latitude) ; ptopsigma:long_name = "Zonal rhon of shallowest persistent ocean on ison" ; ptopsigma:units = "sigma_n" ; ptopsigma:missing_value = 1.e+20f ; float ptopthetao(time, basin, latitude) ; ptopthetao:long_name = "Zonal Temp. of shallowest persistent ocean on ison" ; ptopthetao:units = "degrees_C" ; ptopthetao:missing_value = 1.e+20f ; float ptopso(time, basin, latitude) ; ptopso:long_name = "Zonal Salinity of shallowest persistent ocean on ison" ; ptopso:units = "psu" ; ptopso:missing_value = 1.e+20f ; float isondepth(time, basin, lev, latitude) ; isondepth:long_name = "Global zonal depth of isopycnal" ; isondepth:units = "m" ; isondepth:missing_value = 1.e+20f ; float isonthick(time, basin, lev, latitude) ; isonthick:long_name = "Global zonal thickness of isopycnal" ; isonthick:units = "m" ; isonthick:missing_value = 1.e+20f ; float isonvol(time, basin, lev, latitude) ; isonvol:long_name = "Volume of isopycnal" ; isonvol:units = "10.e12 m^3" ; isonvol:missing_value = 1.e+20f ; float isonthetao(time, basin, lev, latitude) ; isonthetao:long_name = "Sea Water Potential Temperature" ; isonthetao:units = "degrees_C" ; isonthetao:missing_value = 1.e+20f ; float isonso(time, basin, lev, latitude) ; isonso:long_name = "Sea Water Salinity" ; isonso:units = "psu" ; isonso:missing_value = 1.e+20f ;

// global attributes: :Conventions = "CF-1.0" ; :institution = "Program for Climate Model Diagnosis and Intercomparison (LLNL)" ; :data_contact = "Paul J. Durack; pauldurack@llnl.gov; +1 925 422 5208" ; :history = "File processed: 26-10-2014 23:05:51 PM UTC; San Francisco, CA, USA" ; :host = "oceanonly.llnl.gov; UVCDAT version: 2.0.0-3-g38214e5; Python version: 2.7.7 (default, Oct 15 2014, 10:21:57); [GCC 4.4.7 20120313 (Red Hat 4.4.7-11)]" ; :binDensity_version = "commit c44e26d892c017f996c4233e06afb8bcb25691fe Author: Paul J. Durack durack1@llnl.gov Date: Thu Oct 23 18:42:58 2014 -0700" ; } |

— Reply to this email directly or view it on GitHub https://github.com/eguil/Density_bining/issues/19#issuecomment-60647525.

Eric Guilyardi IPSL/LOCEAN - Dir. Rech. CNRS Tour 45, 4eme, piece 406 UPMC, case 100 4 place Jussieu, F-75252 Paris Tel: +33 (0)1 44 27 70 76 Prof. Eric Guilyardi NCAS Climate Meteorology Department University of Reading Reading RG6 6BB - UK Tel: +44 (0)118 378 8315

             http://ncas-climate.nerc.ac.uk/~ericg
durack1 commented 9 years ago

@eguil I can issue a pull request.. But that will wipe out your changes since my last pull from your repo..

You could try: git pull https://github.com/durack1/Density_bining master

durack1 commented 9 years ago

@eguil October 1st was my last merge/pull from your repo: https://github.com/durack1/Density_bining/commits/master?page=2

eguil commented 9 years ago

@durack1 but how can I merge the changes made from Oct 1 by you and by me ? On 27/10/14 23:28, Paul J. Durack wrote:

@eguil https://github.com/eguil I can issue a pull request.. But that will wipe out your changes since my last pull from your repo..

You could try: |git pull https://github.com/durack1/Density_bining master|

— Reply to this email directly or view it on GitHub https://github.com/eguil/Density_bining/issues/19#issuecomment-60680772.

Eric Guilyardi IPSL/LOCEAN - Dir. Rech. CNRS Tour 45, 4eme, piece 406 UPMC, case 100 4 place Jussieu, F-75252 Paris Tel: +33 (0)1 44 27 70 76 Prof. Eric Guilyardi NCAS Climate Meteorology Department University of Reading Reading RG6 6BB - UK Tel: +44 (0)118 378 8315

             http://ncas-climate.nerc.ac.uk/~ericg
durack1 commented 9 years ago

It'll require some intervention.. Here's what git recommends:

Step 1: From your project repository, check out a new branch and test the changes.

git checkout -b durack1-master master
git pull git://github.com/durack1/Density_bining.git master

Step 2: Merge the changes and update on GitHub.

git checkout master
git merge --no-ff durack1-master
git push origin master
durack1 commented 9 years ago

@eguil if your 5 model test was still running you've probably found that it failed.. I've managed to make some headway toward getting this running, but am still struggling with it.. If you have another go pull my version across.. I've commented out your new variables as they're particularly problematic..

If you're able to get it running across the 5 test models then let me know.. If you achieve this I think we then have a functioning prototype and I'll kick this off on the whole archive..

eguil commented 9 years ago

@durack1 I am making progress. ACCESS and BNU are ok (although for some reason the basin zonal means for ptopthetao/so/sigma are all equal to the global mean, but ok for ptopdepth - I am investigating). I propose to drop EC-EARTH as the so and thetao have no valmask value (only 0!). The other option is the change the files themselves to add back the right valmask or to create a proc that would rebuild it from the zero salinity values - I suggest we keep this one for later. On 29/10/14 05:11, Paul J. Durack wrote:

@eguil https://github.com/eguil if your 5 model test was still running you've probably found that it failed.. I've managed to make some headway toward getting this running, but am still struggling with it.. If you have another go pull my version across.. I've commented out your new variables as they're particularly problematic..

If you're able to get it running across the 5 test models then let me know.. If you achieve this I think we then have a functioning prototype and I'll kick this off on the whole archive..

— Reply to this email directly or view it on GitHub https://github.com/eguil/Density_bining/issues/19#issuecomment-60871588.

Eric Guilyardi IPSL/LOCEAN - Dir. Rech. CNRS Tour 45, 4eme, piece 406 UPMC, case 100 4 place Jussieu, F-75252 Paris Tel: +33 (0)1 44 27 70 76 Prof. Eric Guilyardi NCAS Climate Meteorology Department University of Reading Reading RG6 6BB - UK Tel: +44 (0)118 378 8315

             http://ncas-climate.nerc.ac.uk/~ericg
durack1 commented 9 years ago

@eguil I have code which should sort out EC-EARTH - it's at https://github.com/durack1/Density_bining/blob/master/binDensity.py#L515-520

eguil commented 9 years ago

@durack1 - ok I will try On 29/10/14 14:24, Paul J. Durack wrote:

@eguil https://github.com/eguil I have code which /should/ sort out EC-EARTH - it's at https://github.com/durack1/Density_bining/blob/master/binDensity.py#L515-520

— Reply to this email directly or view it on GitHub https://github.com/eguil/Density_bining/issues/19#issuecomment-60922473.

Eric Guilyardi IPSL/LOCEAN - Dir. Rech. CNRS Tour 45, 4eme, piece 406 UPMC, case 100 4 place Jussieu, F-75252 Paris Tel: +33 (0)1 44 27 70 76 Prof. Eric Guilyardi NCAS Climate Meteorology Department University of Reading Reading RG6 6BB - UK Tel: +44 (0)118 378 8315

             http://ncas-climate.nerc.ac.uk/~ericg