Unidata / netcdf-c

Official GitHub repository for netCDF-C libraries and utilities.
BSD 3-Clause "New" or "Revised" License
508 stars 262 forks source link

4.6.3-development ncgen hangs on bzip2 filter in install directory #1347

Open czender opened 5 years ago

czender commented 5 years ago

Environment Information

Summary of Issue

ncgen fails to build bzip2.nc from bzip2.cdl when HDF5_PLUGIN_PATH points to the install library directory, though it works when pointing to the build library directory

Steps to reproduce the behavior

  1. Grab mybzip2.cdl here or create it yourself from netcdf-c/examples/C/bzip2.nc.
  2. Set filters to plugin build directory export HDF5_PLUGIN_PATH=${DATA}/tmp/netcdf-c/plugins/.libs
  3. Create file ncgen -k netCDF-4 -b -o ~/nco/data/bzip2.nc ~/nco/data/bzip2.cdl # Works for me
  4. Set filters to plugin install directory (/usr/localfor me) export HDF5_PLUGIN_PATH=/usr/local/lib
  5. Try to create file ncgen -k netCDF-4 -b -o ~/nco/data/bzip2.nc ~/nco/data/bzip2.cdl # Hangs Any idea what's going on? This occurs with the latest (20190226 12:30 PT) master branch. Is the install mechanism copying everything needed to the install library directory? Some of the correct libraries seem to be in /usr/local:
    zender@skyglow:/data/zender/tmp/netcdf-c/examples/C$ ls -l /usr/local/lib/libbzip* /usr/local/lib/libmisc*
    -rwxr-xr-x. 1 root root   1033 Feb 26 12:32 /usr/local/lib/libbzip2.la
    -rwxr-xr-x. 1 root root 109856 Feb 26 12:32 /usr/local/lib/libbzip2.so
    -rwxr-xr-x. 1 root root   1046 Feb 26 12:32 /usr/local/lib/libmisc.la
    -rwxr-xr-x. 1 root root  12616 Feb 26 12:32 /usr/local/lib/libmisc.so

    Yet it seems to be missing something that is only in the build directory and does not get installed or configured correctly to work from the install directory. The filter libraries are definitely found in the install directory, otherwise I would get an obnoxious EHDFERR. Instead it just hangs. This does not appear to be related to #1344. Please verify that the filters work expected from the install directory and/or let me know the correct way to invoke filters.

DennisHeimbigner commented 5 years ago

Actually, I did not intend the .so files in netcdf-c/plugins to be installed. They are intended only for testing and examples. But ignoring that, I ran your example under Ubuntu 16.2 and it appears to work for me. This is using the current netcdf-c master. Try this to get some more info:

  1. set env variable 'export NETCDF_LOG_LEVEL=1"
  2. execute your ncgen command
  3. report back the output (including the logging info).
DennisHeimbigner commented 5 years ago

Some other suggestions

  1. do 'ldd /usr/local/lib/libbzip2.so' and see if the supporting library paths look ok
  2. export LD_LIBRARY_PATH=... to ensure the supporting libraries are found.
czender commented 5 years ago

Thanks for the quick response. Glad it works for you. It fails for me on Fedora Core 26.

What are the reasons to not install the plugin libraries in INSTALLDIR/lib? I understand that it might be rude to overwrite the default libbzip2 and so forth, so if that is the main reason, please consider allowing users to specify a configure-able install location for any plugin filters that netCDF provides.

zender@skyglow:~/nco/data$ export NETCDF_LOG_LEVEL=1
zender@skyglow:~/nco/data$ export HDF5_PLUGIN_PATH=/usr/local/lib
zender@skyglow:~/nco/data$ ncgen -k netCDF-4 -b -o ~/nco/data/bzip2.nc ~/nco/data/bzip2.cdl
    HDF5 error messages have been turned off.
    NC4_create: path /home/zender/nco/data/bzip2.nc cmode 0x1000 parameters (nil)
    HDF5 error messages turned on.
    NC4_enddef: ncid 0x10000

NB: I am not killing the process. The "Killed" output happens automatically. At first I thought it was just hanging but after a minute or so it exits with "Killed". Here is the output with NETCDF_LOG_LEVEL=5:

zender@skyglow:~/nco/data$ export NETCDF_LOG_LEVEL=5
zender@skyglow:~/nco/data$ ncgen -k netCDF-4 -b -o ~/nco/data/bzip2.nc ~/nco/data/bzip2.cdl
                log_level changed to 5
    HDF5 error messages have been turned off.
    NC4_create: path /home/zender/nco/data/bzip2.nc cmode 0x1000 parameters (nil)
    HDF5 error messages turned on.
            nc4_create_file: path /home/zender/nco/data/bzip2.nc mode 0x1000
            nc4_grp_list_add: name / 
                nc4_create_file: set HDF raw chunk cache to size 4194304 nelems 1009 preemption 0.750000
            NC4_set_provenance: ncid 0x0
        NC4_def_dim: ncid 0x10000 name dim0 len 4
        NC4_def_dim: ncid 0x10000 name dim1 len 4
        NC4_def_dim: ncid 0x10000 name dim2 len 4
        NC4_def_dim: ncid 0x10000 name dim3 len 4
                nc4_find_dim: dimid 0
                nc4_find_dim: dimid 1
                nc4_find_dim: dimid 2
                nc4_find_dim: dimid 3
        NC4_def_var: name var type 5 ndims 4
                dimid[0] 0
                dimid[1] 1
                dimid[2] 2
                dimid[3] 3
                nc4_get_typelen_mem xtype: 5
                nc4_type_new: size 4 name float assignedid 5
                nc4_find_dim: dimid 0
                nc4_find_dim: dimid 1
                nc4_find_dim: dimid 2
                nc4_find_dim: dimid 3
                allocating array of 4 size_t to hold chunksizes for var var
                nc4_find_default_chunksizes2: name var dim 0 DEFAULT_CHUNK_SIZE 4194304 num_values 256.000000 type_size 4 chunksize 4
                nc4_find_default_chunksizes2: name var dim 1 DEFAULT_CHUNK_SIZE 4194304 num_values 256.000000 type_size 4 chunksize 4
                nc4_find_default_chunksizes2: name var dim 2 DEFAULT_CHUNK_SIZE 4194304 num_values 256.000000 type_size 4 chunksize 4
                nc4_find_default_chunksizes2: name var dim 3 DEFAULT_CHUNK_SIZE 4194304 num_values 256.000000 type_size 4 chunksize 4
                total_chunk_size 1024.000000
                nc4_get_typelen_mem xtype: 5
                new varid 0
        nc_def_var_extra: ncid 0x10000 varid 0
                nc4_get_typelen_mem xtype: 5
        nc_def_var_extra: ncid 0x10000 varid 0
        NC4_def_var_filter: ncid 0x10000 varid 0
    NC4_enddef: ncid 0x10000
        *** NetCDF-4 Internal Metadata: int_ncid 0x10000 ext_ncid 0x10000
        FILE - path: /home/zender/nco/data/bzip2.nc cmode: 0x1009 parallel: 0 redef: 0 fill_mode: 0 no_write: 0 next_nc_grpid: 1
         GROUP - / nc_grpid: 0 nvars: 1 natts: 0
         DIMENSION - dimid: 0 name: dim0 len: 4 unlimited: 0
         DIMENSION - dimid: 1 name: dim1 len: 4 unlimited: 0
         DIMENSION - dimid: 2 name: dim2 len: 4 unlimited: 0
         DIMENSION - dimid: 3 name: dim3 len: 4 unlimited: 0
         VARIABLE - varid: 0 name: var ndims: 4 dimscale: 0 dimids: 0 1 2 3
            nc4_rec_write_groups_types: grp->hdr.name /
            nc4_rec_write_metadata: grp->hdr.name /, bad_coord_order 0
                nc4_create_dim_wo_var: creating dim dim0
                nc4_create_dim_wo_var: about to H5Dcreate1 a dimscale dataset dim0
                write_netcdf4_dimid: writing secret dimid 0
                nc4_create_dim_wo_var: creating dim dim1
                nc4_create_dim_wo_var: about to H5Dcreate1 a dimscale dataset dim1
                write_netcdf4_dimid: writing secret dimid 1
                nc4_create_dim_wo_var: creating dim dim2
                nc4_create_dim_wo_var: about to H5Dcreate1 a dimscale dataset dim2
                write_netcdf4_dimid: writing secret dimid 2
                nc4_create_dim_wo_var: creating dim dim3
                nc4_create_dim_wo_var: about to H5Dcreate1 a dimscale dataset dim3
                write_netcdf4_dimid: writing secret dimid 3
                write_var: writing var var
            var_create_dataset:: name var
DennisHeimbigner commented 5 years ago

That 'Killed' is so weird. I have never seen it before. The only thing that comes to mind is some kind of memory error. Since we do not strip ncgen (I think), you might be able to apply gdb or valgrind to the ncgen command to get some more info. Also, I assume you are not using mpio. I think you are right that installing the bzip plugin is ok -- there is no point in installing libmisc since it is only for testing. As you note, I would worry about overwriting some existing bzip2 library. We need some alternate name, something like libhdf5bzip2.so.

DennisHeimbigner commented 5 years ago

I found some info here: https://stackoverflow.com/questions/726690/what-killed-my-process-and-why

czender commented 5 years ago

Thanks. I ran it again and watched memory with "top" and it seems to cause an OOM so the kernel OOM-killer must have been invoked. I agree that the filter libraries either need a new name or their own install location so as not to conflict with existing libraries of the same name. I will try to create a filter for Zstandard and if/when it works might netCDF be interested in distributing it if I create and submit a PR?

DennisHeimbigner commented 5 years ago

We have never quite decided what to do about providing custom filters. Since you are going to have to register it with the HDF group to get a standard filter id number, it might be best to do what they suggest for now.

edwardhartnett commented 2 years ago

I believe the bzip filter should be installed, just as the zstandard filter is to be installed. (See #2294).