Another ice error - Githubissues

AndyHoggANU commented 5 years ago

I started reprocessing the 0.1° ice data, as last night's successful run only did up to output029. When I applied it to the whole dataset, this error resurfaced:

ValueError: variable TLON not equal across datasets

AndyHoggANU commented 5 years ago

This happens for all 5 ice variables, BTW.

aidanheerdegen commented 5 years ago

Odd. It didn't for me, but I only tested one variable. Was I not using the whole range?

AndyHoggANU commented 5 years ago

Yes - the code on github specified output0[0-2]?, which was left over from my last failed attempt. I think I have a way of figuring out where this one is, so leave it with me for a couple of hours.

aidanheerdegen commented 5 years ago

It is a fragile process, unfortunately, so I should have been testing with the full production dataset. As you're finding out just how fragile it is. Happy to revisit as I've now gained a lot of unwanted expertise ...

AndyHoggANU commented 5 years ago

OK, so some good news at last. The first thing is that processing in 4-5 year chunks will work and is not too tricky. It turns out that:

output0[0-2]? selects just 1985-89,
output0[3-5]? selects just 1990-94,
output0[6-8]? selects just 1995-99,
output1[2-4]? selects just 2005-09,
output1[5-7]? selects just 2010-14,
output1[8-9]? selects just 2015-17,

The issue is with output09? and output1[0-1]?, which I can pass through using two filename arguments, but this is where the TLON error is. Drilling down into that one now.

AndyHoggANU commented 5 years ago

OK, here is another curiosity. My 1985-89 ice files take a lot longer to process, and produce much larger output files, than any other 5-year segment.

amh157@raijin2:ice %% du -sh ./hi-m/*
631M    ./hi-m/hi-m_access-om2-01_198501_198512.nc
631M    ./hi-m/hi-m_access-om2-01_198601_198612.nc
631M    ./hi-m/hi-m_access-om2-01_198701_198712.nc
631M    ./hi-m/hi-m_access-om2-01_198801_198812.nc
631M    ./hi-m/hi-m_access-om2-01_198901_198912.nc
88M ./hi-m/hi-m_access-om2-01_199001_199012.nc
88M ./hi-m/hi-m_access-om2-01_199101_199112.nc
88M ./hi-m/hi-m_access-om2-01_199201_199212.nc
88M ./hi-m/hi-m_access-om2-01_199301_199312.nc
...

Yet, I can't see any real difference in the final files? Is this due to the way the originals are compressed? Any hints here? I'm not so worried about the size, just the consistency ...

AndyHoggANU commented 5 years ago

So, this TLON error is somewhere in 2004 ...

aidanheerdegen commented 5 years ago

The larger files aren't compressed. hi-m_access-om2-01_198501_198512:

        float hi_m(time, nj, ni) ;
                hi_m:_FillValue = 1.e+30f ;
                hi_m:units = "m" ;
                hi_m:long_name = "grid cell mean ice thickness" ;
                hi_m:cell_measures = "area: tarea" ;
                hi_m:cell_methods = "time: mean" ;
                hi_m:time_rep = "averaged" ;
                hi_m:coordinates = "TLON ULAT TLAT ULON" ;
                hi_m:_Storage = "chunked" ;
                hi_m:_ChunkSizes = 1, 675, 900 ;
                hi_m:_Endianness = "little" ;

hi-m_access-om2-01_199001_199012

        float hi_m(time, nj, ni) ;
                hi_m:_FillValue = 1.e+30f ;
                hi_m:units = "m" ;
                hi_m:long_name = "grid cell mean ice thickness" ;
                hi_m:cell_measures = "area: tarea" ;
                hi_m:cell_methods = "time: mean" ;
                hi_m:time_rep = "averaged" ;
                hi_m:coordinates = "TLON TLAT ULON ULAT" ;
                hi_m:_Storage = "chunked" ;
                hi_m:_ChunkSizes = 1, 675, 900 ;
                hi_m:_DeflateLevel = 5 ;
                hi_m:_Shuffle = "true" ;
                hi_m:_Endianness = "little" ;

I'm not sure why that would be the case. The input files are compressed.

AndyHoggANU commented 5 years ago

Hmm. Curious. Re-run these and they still seem to be uncompressed ... I guess I should just compress them and be done with it?

AndyHoggANU commented 5 years ago

OK, for the TLON error, I have reduced it down to a difference between output114 and output115 ... but I can't see any difference. Can you take a look and see if you see anything?

What does splitvar use to read it in? I might try and emulate that to see if I can recreate the bug in python.

aidanheerdegen commented 5 years ago

The land masking changes between iceh.2004-02.nc and iceh.2004-03.nc

Screen Shot 2019-08-21 at 12 28 29 pm

I'll look into a work-around

aidanheerdegen commented 5 years ago

I've pushed a fix https://github.com/coecms/splitvar/commit/0ff571cb823a6a16a6a8b151505d4d4f523fbb24 and will install into conda

AndyHoggANU commented 5 years ago

Oh, that's weird. OK, can you let me know when the conda is updated on raijin and I will give it a go. Thanks!!

aidanheerdegen commented 5 years ago

Oh, now I see why that works, it concatenates TLON and adds a time dimension:

>>> ds = xarray.open_mfdataset('/g/data3/hh5/tmp/cosima/access-om2-01/01deg_jra55v13_iaf/output11[4-5]/ice/OUTPUT/iceh.2004-0[2-3].nc',decode_cf=False,  engine='netcdf4')
>>> ds.TLON
<xarray.DataArray 'TLON' (time: 2, nj: 2700, ni: 3600)>
dask.array<shape=(2, 2700, 3600), dtype=float32, chunksize=(1, 2700, 3600)>
Coordinates:
  * time     (time) float64 6.999e+03 7.03e+03
Dimensions without coordinates: nj, ni
Attributes:
    long_name:      T grid center longitude
    units:          degrees_east
    missing_value:  1e+30
    _FillValue:     1e+30

Ok, maybe that isn't a good idea. If you try and use open_mfdataset on these split files it will have issues.

AndyHoggANU commented 5 years ago

OK, but it used to work on the original files, so why would it fail now?

AndyHoggANU commented 5 years ago

Compression is done, BTW.

AndyHoggANU commented 5 years ago

For what it's worth, I have collated the remaining ice files (2000-2004) but there seem to be some compression issues creeping back in.

AndyHoggANU commented 5 years ago

Latest news: Mostly good. I have reprocessed all the ice files for access-om2-01. I still had to do it in 7 batches, which took some time, but that could be easily programmed into the bash script in the future. The files were all the same size ... but all uncompressed. That's OK, as compressing is just a small extra step which can be done in one hit, so is a small impost.

The only catch that i can see is that TLON masking is still different at the start and end of the run:

Did we expect this might be fixed? I did ... but I'm not worried about it provided that xarray can read in the whole dataset in one hit. I will try to test this. If it can't, then we may need to come back to this code and try again!!

AndyHoggANU commented 5 years ago

Update I see what has happened here. xarray has interpreted the changing TLON as needing it's own time dimension as suggested by Aidan above:

<xarray.Dataset>
Dimensions:      (d2: 2, ni: 3600, nj: 2700, time: 216)
Coordinates:
  * time         (time) datetime64[ns] 2000-01-16 2000-02-15 ... 2017-12-16
    TLON         (time, nj, ni) float32 dask.array<shape=(216, 2700, 3600), chunksize=(12, 2700, 3600)>
    TLAT         (time, nj, ni) float32 dask.array<shape=(216, 2700, 3600), chunksize=(12, 2700, 3600)>
    ULAT         (time, nj, ni) float32 dask.array<shape=(216, 2700, 3600), chunksize=(12, 2700, 3600)>
    ULON         (time, nj, ni) float32 dask.array<shape=(216, 2700, 3600), chunksize=(12, 2700, 3600)>
Dimensions without coordinates: d2, ni, nj
Data variables:
    aice_m       (time, nj, ni) float32 dask.array<shape=(216, 2700, 3600), chunksize=(12, 2700, 3600)>
    tarea        (time, nj, ni) float32 dask.array<shape=(216, 2700, 3600), chunksize=(12, 2700, 3600)>

As it stands, this uses up extra space, but it is usable, in that the dataset can be loaded by xarray, etc. I think this means that either the -a grid.nc argument hasn't worked, or else I haven't applied it properly??

AndyHoggANU commented 5 years ago

I'm finally coming back to this data processing. Note that, while the ice files are all processed, we still have the problem that all files retain the time-dependent grid information. @aidanheerdegen - is there a workaround for this?

aidanheerdegen commented 5 years ago

The code has this comment:

# Make a grid file because we're going to delete all the grid information and add it back
# as it isn't consistent across the data
#ncks -O -v TLON,TLAT,ULON,ULAT,NCAT,tmask,uarea,tarea,blkmask,dxt,dyt,dxu,dyu,HTN,HTE,ANGLE,ANGLET ${COSIMADIR}/${MODEL}/${EXPT}/output197/${SUBMODEL}/OUTPUT/iceh.????-12.nc grid.nc

which sounds what I was suggesting. I take it this doesn't work?

AndyHoggANU commented 5 years ago

OK, I am testing this with the 1° ice cases -- it produces a grid.nc file for me, with no time-dependence in the TLON, TLAT etc -- but from what I can see the data files retain time-dependence. So, I am guessing that -a grid.nc argument isn't working as intended?

AndyHoggANU commented 5 years ago

OK - my bad! My script was picking up the stable version of conda/analysis3 -- I needed unstable of course! Testing this now - will close if all works.

AndyHoggANU commented 5 years ago

OK - it works at 1°. I still need to test at higher resolution but let's be optimistic and close this for now!

aidanheerdegen / publish_cosima_data

Another ice error #18