LLNL / H5Z-ZFP

A registered ZFP compression plugin for HDF5
Other
50 stars 22 forks source link

Making use of the filter in NetCDF-4? #143

Closed shaomeng closed 4 months ago

shaomeng commented 5 months ago

I'm wondering if @markcmiller86 has any experience to make the HDF5 filter work with the NetCDF library? It appears to me that the actual work would be minimal, but something has to happen to allow NetCDF understand particular compression parameters in applications such as ncgen and nccopy. The best documentation I can find still isn't very helpful, so I was thinking maybe Mark knows a few things? Thanks!

lindstro commented 5 months ago

I'll let @markcmiller86 respond as he has more experience with these issues, but I wanted to point out that Leigh Orf has had some partial success with HDF5 compression filters and NetCDF4. His example may not be exactly what you had in mind, but at least it shows a way of generating zfp-compressed NetCDF files.

markcmiller86 commented 4 months ago

@shaomeng sorry for delay in responding to your inquiry.

So, H5Z-ZFP supports HDF5's generic filter interface and it looks like NetCDF supports passing generic parameters through to HDF5.

Suppose you want to use H5Z-ZFP rate mode with a rate of 4.75 bits on some NetCDF data you are writing. Here is what I think it would look like...

unsigned int cd_vals[6] = {0,0,0,0,0,0};

/* set ZFP mode to rate */
cd_vals[0] = 1; /* 1=rate mode */

/* set rate for rate mode */
double rate = 4.75;
double *p = (double *) &cd_vals[2];
*p = rate; /* copies double into positions 2 and 3 of cd_vals */

/* define stuff to netcdf */
nc_def_var_filter(ncid, varid, 32013, 6, cd_vals);

NetCDF docs say nc_def_var_filter ... This must be invoked after the variable has been created and before nc_enddef is invoked.

So, once you do the above, when you to go nc_put_var_xxx, it will compress it as you specify.

Does this make sense?

markcmiller86 commented 4 months ago

Ok, so I just tested the above using NetCDF's simple_xy_nc4_wr.c example from its examples directory using NetCDF 4.9.2 and HDF5 1.12.0. Here is what I needed to do...

  1. configure and build netcdf with hdf5 support

    [scratlantis:~/silo/netcdf-c-4.9.2] miller86% pwd
    /Users/miller86/silo/netcdf-c-4.9.2
    [scratlantis:~/silo/netcdf-c-4.9.2] miller86% env CPPFLAGS=-I/Users/miller86/silo/hdf5-1.12.0/build_default/myinstall/include \
    LDFLAGS=-L/Users/miller86/silo/hdf5-1.12.0/build_default/myinstall/lib \
    ./configure --enable-hdf5 --prefix=`pwd`/myinstall
    [scratlantis:~/silo/netcdf-c-4.9.2] miller86%  make install
    [scratlantis:~/silo/netcdf-c-4.9.2] miller86% ls -R myinstall/
    bin  include lib share
    
    myinstall//bin:
    nc-config    nc4print    nccopy      ncdump      ncgen       ncgen3      ocprint
    
    myinstall//include:
    netcdf.h         netcdf_filter.h         netcdf_json.h
    netcdf_aux.h         netcdf_filter_build.h       netcdf_mem.h
    netcdf_dispatch.h        netcdf_filter_hdf5_build.h  netcdf_meta.h
    
    myinstall//lib:
    libnetcdf.19.dylib   libnetcdf.dylib     libnetcdf.settings
    libnetcdf.a      libnetcdf.la        pkgconfig
  2. Modify simple_xy_nc4_wr.c to include a section that writes the same data but with zfp compression

    Original code #include #include #include /* This is the name of the data file we will create. */ #define FILE_NAME "simple_xy_nc4.nc" /* We are writing 2D data, a 6 x 12 grid. */ #define NDIMS 2 #define NX 60 #define NY 120 /* Handle errors by printing an error message and exiting with a * non-zero status. */ #define ERRCODE 2 #define ERR(e) {printf("Error: %s\n", nc_strerror(e)); exit(ERRCODE);} int main() { int ncid, x_dimid, y_dimid, varid; int dimids[NDIMS]; size_t chunks[NDIMS]; int shuffle, deflate, deflate_level; int data_out[NX][NY]; int x, y, retval; /* Set chunking, shuffle, and deflate. */ shuffle = NC_SHUFFLE; deflate = 1; deflate_level = 1; /* Create some pretend data. If this wasn't an example program, we * would have some real data to write, for example, model output. */ for (x = 0; x < NX; x++) for (y = 0; y < NY; y++) data_out[x][y] = x * NY + y; /* Create the file. The NC_NETCDF4 parameter tells netCDF to create * a file in netCDF-4/HDF5 standard. */ if ((retval = nc_create(FILE_NAME, NC_NETCDF4, &ncid))) ERR(retval); /* Define the dimensions. */ if ((retval = nc_def_dim(ncid, "x", NX, &x_dimid))) ERR(retval); if ((retval = nc_def_dim(ncid, "y", NY, &y_dimid))) ERR(retval); /* Set up variabe data. */ dimids[0] = x_dimid; dimids[1] = y_dimid; chunks[0] = NX/4; chunks[1] = NY/4; /* Define the variable. */ if ((retval = nc_def_var(ncid, "data", NC_INT, NDIMS, dimids, &varid))) ERR(retval); if ((retval = nc_def_var_chunking(ncid, varid, 0, &chunks[0]))) ERR(retval); if ((retval = nc_def_var_deflate(ncid, varid, shuffle, deflate, deflate_level))) ERR(retval); /* No need to explicitly end define mode for netCDF-4 files. Write * the pretend data to the file. */ if ((retval = nc_put_var_int(ncid, varid, &data_out[0][0]))) ERR(retval); /* Close the file. */ if ((retval = nc_close(ncid))) ERR(retval); printf("*** SUCCESS writing example file simple_xy_nc4.nc!\n"); return 0; }
    Modified code to include zfp example #include #include #include /* This is the name of the data file we will create. */ #define FILE_NAME "simple_xy_nc4.nc" /* We are writing 2D data, a 6 x 12 grid. */ #define NDIMS 2 #define NX 60 #define NY 120 /* Handle errors by printing an error message and exiting with a * non-zero status. */ #define ERRCODE 2 #define ERR(e) {printf("Error: %s\n", nc_strerror(e)); exit(ERRCODE);} int main() { int ncid, x_dimid, y_dimid, varid; int dimids[NDIMS]; size_t chunks[NDIMS]; int shuffle, deflate, deflate_level; int data_out[NX][NY]; int x, y, retval; /* Set chunking, shuffle, and deflate. */ shuffle = NC_SHUFFLE; deflate = 1; deflate_level = 1; /* Create some pretend data. If this wasn't an example program, we * would have some real data to write, for example, model output. */ for (x = 0; x < NX; x++) for (y = 0; y < NY; y++) data_out[x][y] = x * NY + y; /* Create the file. The NC_NETCDF4 parameter tells netCDF to create * a file in netCDF-4/HDF5 standard. */ if ((retval = nc_create(FILE_NAME, NC_NETCDF4, &ncid))) ERR(retval); /* Define the dimensions. */ if ((retval = nc_def_dim(ncid, "x", NX, &x_dimid))) ERR(retval); if ((retval = nc_def_dim(ncid, "y", NY, &y_dimid))) ERR(retval); /* Set up variabe data. */ dimids[0] = x_dimid; dimids[1] = y_dimid; chunks[0] = NX/4; chunks[1] = NY/4; /* Define the variable. */ if ((retval = nc_def_var(ncid, "data", NC_INT, NDIMS, dimids, &varid))) ERR(retval); if ((retval = nc_def_var_chunking(ncid, varid, 0, &chunks[0]))) ERR(retval); if ((retval = nc_def_var_deflate(ncid, varid, shuffle, deflate, deflate_level))) ERR(retval); /* No need to explicitly end define mode for netCDF-4 files. Write * the pretend data to the file. */ if ((retval = nc_put_var_int(ncid, varid, &data_out[0][0]))) ERR(retval); { int var2id; if ((retval = nc_def_var(ncid, "data2", NC_INT, NDIMS, dimids, &var2id))) ERR(retval); unsigned int cd_vals[6] = {0,0,0,0,0,0}; /* set ZFP mode to rate */ cd_vals[0] = 1; /* 1=rate mode */ /* set rate for rate mode */ double rate = 4.75; double *p = (double *) &cd_vals[2]; *p = rate; /* copies double into positions 2 and 3 of cd_vals */ /* define stuff to netcdf */ nc_def_var_filter(ncid, var2id, 32013, 6, cd_vals); if ((retval = nc_put_var_int(ncid, var2id, &data_out[0][0]))) ERR(retval); } /* Close the file. */ if ((retval = nc_close(ncid))) ERR(retval); printf("*** SUCCESS writing example file simple_xy_nc4.nc!\n"); return 0; }
  3. Compile the modified example code
    gcc simple_xy_nc4_wr.c -o simple_xy_nc4_wr \
    -I/Users/miller86/silo/netcdf-c-4.9.2/myinstall/include \
    -L/Users/miller86/silo/netcdf-c-4.9.2/myinstall/lib -lnetcdf
  4. Find the directory with the H5Z-ZFP plugin, libh5zzfp.so
    [scratlantis:~/silo/zfp_filter/H5Z-ZFP] miller86% ls /Users/miller86/silo/zfp_filter/H5Z-ZFP/src/plugin
    libh5zzfp.so
  5. Run the example and produce a netcdf file with zfp compressed data in it
    env HDF5_PLUGIN_PATH=/Users/miller86/silo/zfp_filter/H5Z-ZFP/src/plugin ./simple_xy_nc4_wr
  6. Examine the resulting file to confirm data was compressed with ZFP
    [scratlantis:~/silo/zfp_filter/H5Z-ZFP] miller86% h5ls -vlr simple_xy_nc4.nc | sed -e 's/^/   /'
    Opened "simple_xy_nc4.nc" with sec2 driver.
    /                        Group
       Attribute: _NCProperties scalar
           Type:      34-byte null-terminated ASCII string
           Data:  "version=2,netcdf=4.9.2,hdf5=1.12.0"
       Location:  1:48
       Links:     1
    /data                    Dataset {60/60, 120/120}
       Attribute: DIMENSION_LIST {2}
           Type:      variable length of
                      object reference
           Data:  (DATASET-1:239), (DATASET-1:563)
       Attribute: _Netcdf4Coordinates {2}
           Type:      native int
           Data:  0, 1
       Location:  1:887
       Links:     1
       Chunks:    {15, 30} 1800 bytes
       Storage:   28800 logical bytes, 5522 allocated bytes, 521.55% utilization
       Filter-0:  shuffle-2 OPT {4}
       Filter-1:  deflate-1 OPT {1}
       Type:      native int
    /data2                   Dataset {60/60, 120/120}
       Attribute: DIMENSION_LIST {2}
           Type:      variable length of
                      object reference
           Data:  (DATASET-1:239), (DATASET-1:563)
       Attribute: _Netcdf4Coordinates {2}
           Type:      native int
           Data:  0, 1
       Location:  1:1640
       Links:     1
       Chunks:    {60, 120} 28800 bytes
       Storage:   28800 logical bytes, 4275 allocated bytes, 673.68% utilization
       Filter-0:  H5Z-ZFP-1.1.0 (ZFP-1.0.0)-32013  {268456208, 91252346, 2952791924, 78643203}
       Type:      native int
    .
    .
    .
  7. If you want HDF5 tools to actually be able to read the data, you need to point them at the plugin using HDF5_PLUGIN_PATH
    [scratlantis:~/silo/zfp_filter/H5Z-ZFP] miller86% env HDF5_PLUGIN_PATH=/Users/miller86/silo/zfp_filter/H5Z-ZFP/src/plugin h5ls -vlrd simple_xy_nc4.nc
    .
    .
    .
    /data2                   Dataset {60/60, 120/120}
       Attribute: DIMENSION_LIST {2}
           Type:      variable length of
                      object reference
           Data:  (DATASET-1:239), (DATASET-1:563)
       Attribute: _Netcdf4Coordinates {2}
           Type:      native int
           Data:  0, 1
       Location:  1:1640
       Links:     1
       Chunks:    {60, 120} 28800 bytes
       Storage:   28800 logical bytes, 4275 allocated bytes, 673.68% utilization
       Filter-0:  H5Z-ZFP-1.1.0 (ZFP-1.0.0)-32013  {268456208, 91252346, 2952791924, 78643203}
       Type:      native int
       Data:
           (0,0) -1, 1, 1, 3, 3, 5, 5, 7, 7, 9, 9, 11, 11, 13, 13, 15, 15, 17, 17, 19, 19, 21, 21, 23, 23, 25, 25, 27, 27, 29, 29, 31,
           (0,32) 31, 33, 33, 35, 35, 37, 37, 39, 39, 41, 41, 43, 43, 45, 45, 47, 47, 49, 49, 51, 51, 53, 53, 55, 55, 57, 57, 59, 59,
           (0,61) 61, 61, 63, 63, 65, 65, 67, 67, 69, 69, 71, 71, 73, 73, 75, 75, 77, 77, 79, 79, 81, 81, 83, 83, 85, 85, 87, 87, 89,
           (0,90) 89, 91, 91, 93, 93, 95, 95, 97, 97, 99, 99, 101, 101, 103, 103, 105, 105, 107, 107, 109, 109, 111, 111, 113, 113,
    .
    .
    .
shaomeng commented 4 months ago

Hi @markcmiller86 , this is awesome, and I've tested it successfully with the SPERR plugin too! Sometimes the solution is as simple as one line of code :) Cheers!

leighorf commented 4 months ago

Just wanted to chime in and say that I've been using ZFP with netcdf4 for three or so years now with great success. With software like VAPOR and the hdf5plugin python plugin I am able to share my ZFP compressed netCDF files with collaborators who can visualize or read it without too much hassle. I am always glad to see lossy compression more accessible to the general user.