Closed alaniwi closed 7 years ago
Noting that cdo
was built before grib_api
, which is wrong. But rebuilding cdo
(tested on jasmin-sci1-dev
) has not fixed it. The build order of relevant(?) packages, with this fixed, is as follows:
[iwi@jasmin-sci1-dev ~]$ rpm -q --queryformat="%{BUILDTIME} %{BUILDTIME:date} %{NAME}\n" grib_api netcdf cdo hdf5 udunits | sort -n
1449490276 Mon 07 Dec 2015 12:11:16 GMT hdf5
1460044475 Thu 07 Apr 2016 16:54:35 BST netcdf
1460066287 Thu 07 Apr 2016 22:58:07 BST udunits
1460066937 Thu 07 Apr 2016 23:08:57 BST grib_api
1462295346 Tue 03 May 2016 18:09:06 BST cdo
which should be okay(?)
Some traceback after disabling optimisations (and, for good measure, linking with -lefence
, although this doesn't seem to affect the interesting part of the traceback - still the same bit of application code that triggers it)
#0 0x0000003540c32907 in kill () from /lib64/libc.so.6
#1 0x00007fb2543b61e5 in ?? () from /usr/lib64/libefence.so.0
#2 0x00007fb2543b675d in EF_Abort () from /usr/lib64/libefence.so.0
#3 0x00007fb2543b5a13 in free () from /usr/lib64/libefence.so.0
#4 0x00000000005ea116 in memFree (ptr=0x7fb254384848, file=0x8182f7 "grid.c",
functionname=0x819ae0 "grid_free_components", line=179) at dmemory.c:513
#5 0x00000000005eeaeb in grid_free_components (gridptr=0x7fb254393860) at grid.c:179
#6 0x00000000005ef9fc in gridDestroyKernel (gridptr=0x7fb254393860) at grid.c:579
#7 0x00000000005efa98 in gridDestroyP (gridptr=0x7fb254393860) at grid.c:603
#8 0x00000000005fe214 in reshListDestruct (namespaceID=0) at resource_handle.c:180
#9 0x00000000005fd5fd in namespaceDelete (namespaceID=0) at namespace.c:214
#10 0x00000000005fe343 in listDestroy () at resource_handle.c:197
#11 0x0000003540c35b22 in exit () from /lib64/libc.so.6
#12 0x0000003540c1ed64 in __libc_start_main () from /lib64/libc.so.6
#13 0x00000000004073c9 in _start ()
(all these calls in the application are within the libcdi/src
directory of the code)
Bug not reproducible with cdo 1.7.0 (linked to all the same lib versions), so apparently only introduced with cdo 1.7.1.
Specifically, seems to be while freeing pointer gridptr->rowlon
in grid_free_components
. The pointer value is plausible compared to other pointers (e.g. gridptr = 0x7fb254393860
, gridptr->rowlon = 0x7fb254384848
), but electric fence claims "address not from malloc()".
Another user-supplied example:
cdo -M -f grb mergetime ggam201210290600.grb ggam201210291200.grb test.grb
using input files from /badc/ecmwf-era-interim/data/gg/am/2012/10/29
(http://dap.ceda.ac.uk/data/badc/ecmwf-era-interim/data/gg/am/2012/10/29/)
This aborts at exactly the same place as the sinfo
example (trying to free the gridptr->rowlon
pointer), and leaves an unusable output file. (Contrast with the same command under 1.7.0, which works.)
It is possible to work around the problem using -R
(convert reduced to regular Gaussian grid), in which CDO 1.7.1 and 1.7.0 give identical output, but these output files are ~50% larger than with the reduced grid.
Note full Gaussian grid files are usually only ~10% larger than reduced Gaussian grid files
Also, if the command is run without the -f grb
it seems to generate a valid reduced gaussian grid output, but still crash with invalid pointer
cdo -M mergetime ggam201210290600.grb ggam201210291200.grb test2.grb
Reply from the maintainer:
"This bug will be fixed in the next CDO release. See also: https://code.zmaw.de/issues/6780 " (although I can't access the link)
1.7.2 is now out, so in principle this should fix it.
cdo 1.7.2 builds, and the originally reported bug is fixed in the test command.
[builderdev@builder SPECS]$ cdo sinfo /tmp/ggam201210290600.grb
File format : GRIB
-1 : Institut Source Steptype Levels Num Points Num Dtype : Parameter ID
1 : ECMWF unknown instant 60 1 88838 1 P16 : 133.128
2 : ECMWF unknown instant 60 1 88838 1 P16 : 203.128
3 : ECMWF unknown instant 60 1 88838 1 P16 : 246.128
4 : ECMWF unknown instant 60 1 88838 1 P16 : 247.128
5 : ECMWF unknown instant 60 1 88838 1 P8 : 248.128
Grid coordinates :
1 : gaussian reduced : points=88838 nlat=256 np=128
lat : 89.46282 to -89.46282 degrees_north
Vertical coordinates :
1 : hybrid : levels=60
lev : 1 to 60 by 1 level
available : vct
Time coordinate : 1 step
RefTime = 2012-10-29 06:00:00 Units = hours Calendar = proleptic_gregorian
YYYY-MM-DD hh:mm:ss YYYY-MM-DD hh:mm:ss YYYY-MM-DD hh:mm:ss YYYY-MM-DD hh:mm:ss
2012-10-29 06:00:00
cdo sinfo: Processed 5 variables over 1 timestep ( 0.02s )
User-reported (see FP 38766).
Trying to free invalid pointer when accessing grib files with reduced Gaussian grid. (User reports that files with spectral and/or full Gaussian gridded data are okay.) This problem has been introduced in JAP 1.1-27.
Example:
gives