Closed kalvdans closed 4 years ago
I can reproduce this issue using c-blosc from master branch using the following steps:
git clone https://github.com/Blosc/hdf5-blosc.git
cd hdf5-blosc
mkdir build
cd build
cmake -DCMAKE_INSTALL_PREFIX=/tmp/inst -DPLUGIN_INSTALL_PATH=/tmp/inst/lib ..
make -j5 install # Will clone and install c-blosc from master branch into blosc/
export LD_LIBRARY_PATH=/tmp/inst/lib:$PWD/blosc/lib
export HDF5_PLUGIN_PATH=/tmp/inst/lib
h5import /dev/zero -d 1 -o /tmp/foo.hdf5 && h5repack -f UD=32001,0,0,0,0,4,1,2 /tmp/foo{,2}.hdf5
This was with the following HDF5 on Ubuntu 20.04:
estan@edison:~$ dpkg -l libhdf5-103
Desired=Unknown/Install/Remove/Purge/Hold
| Status=Not/Inst/Conf-files/Unpacked/halF-conf/Half-inst/trig-aWait/Trig-pend
|/ Err?=(none)/Reinst-required (Status,Err: uppercase=bad)
||/ Name Version Architecture Description
+++-=================-=======================-============-==================================================================
ii libhdf5-103:amd64 1.10.4+repack-11ubuntu1 amd64 Hierarchical Data Format 5 (HDF5) - runtime files - serial version
estan@edison:~$
Yes. Here HDF5 is trying to compress buffers that are 4 bytes long, and C-Blosc clearly cannot do that (the header alone will take 16 bytes). My guess is that blosc_compress()
is returning a negative value indicating that the buffer cannot be compressed and then HDF5 just copies data as-is. Now, I suppose that a library like C-Blosc should be quiet in this regard. Hmmm, maybe these kind of informational messages could be issued only when an environment variable (e.g. BLOSC_DEBUG
) would be set. Thoughts?
The documentation for blosc_compress()
states
A negative return value means that an internal error happened. This should never happen.
so if hdf5-blosc-plugin is relying on that to probe for the minimum size it is doing something wrong. It should use BLOSC_MAX_OVERHEAD
instead. Let us just verify that the bug is within hdf5-blosc-plugin and then we can close this bug without removing the printout.
Another (more likely) possibility is that bloc_compress()
is returning a zero, which means:
If `src` buffer cannot be compressed into `destsize`, the return
value is zero and you should discard the contents of the `dest`
buffer.
In this case, both C-Blosc and hdf5-blosc-plugin are working correctly, and we only need to get rid of the warning (or preferably making it optional via BLOSC_DEBUG
environment variable).
I confirmed with
diff --git a/src/blosc_filter.c b/src/blosc_filter.c
index bfd8c3e..fab141b 100644
--- a/src/blosc_filter.c
+++ b/src/blosc_filter.c
@@ -218,6 +218,7 @@ size_t blosc_filter(unsigned flags, size_t cd_nelmts,
blosc_set_compressor(compname);
status = blosc_compress(clevel, doshuffle, typesize, nbytes,
*buf, outbuf, nbytes);
+ printf("status = %d\n", status);
if (status < 0) {
PUSH_ERR("blosc_filter", H5E_CALLBACK, "Blosc compression error");
goto failed;
that blosc_compress
is returning -1
:
estan@edison:~/hdf5-blosc/build$ h5import /dev/zero -d 1 -o /tmp/foo.hdf5 && h5repack -f UD=32001,0,0,0,0,4,1,2 /tmp/foo{,2}.hdf5
Output buffer size should be larger than 16 bytes
status = -1
Output buffer size should be larger than 16 bytes
status = -1
estan@edison:~/hdf5-blosc/build$
I'm surprised we're not getting an error printed from the HDF5 library, since the filter code handles this case using the following:
#define PUSH_ERR(func, minor, str, ...) H5Epush(H5E_DEFAULT, __FILE__, func, __LINE__, H5E_ERR_CLS, H5E_PLINE, minor, str, ##__VA_ARGS__)
.
.
.
if (status < 0) {
PUSH_ERR("blosc_filter", H5E_CALLBACK, "Blosc compression error");
goto failed;
}
So it is pushing an error message about this on the HDF5 error stack.
In any case, since the filter failed and reported the failure to HDF5 by returning 0, HDF5 proceeded to save the dataset unfiltered:
estan@edison:~/hdf5-blosc/build$ h5dump -H -p /tmp/foo.hdf5
HDF5 "/tmp/foo.hdf5" {
GROUP "/" {
DATASET "dataset0" {
DATATYPE H5T_IEEE_F32LE
DATASPACE SIMPLE { ( 1 ) / ( 1 ) }
STORAGE_LAYOUT {
CONTIGUOUS
SIZE 4
OFFSET 2048
}
FILTERS {
NONE
}
FILLVALUE {
FILL_TIME H5D_FILL_TIME_IFSET
VALUE H5D_FILL_VALUE_DEFAULT
}
ALLOCATION_TIME {
H5D_ALLOC_TIME_LATE
}
}
}
}
estan@edison:~/hdf5-blosc/build$
(Note the FILTERS { NONE }
)
If C-Blosc is returning a -1, then this is a bug; it should be a 0 (cannot compress instead of an internal error). This should be fixed. And the hdf5-blosc plugin should treat the 0 value differently (i.e. not trying to push an error into HDF5 itself).
Regarding why HDF5 does not throw an error, it would be interesting to know why.
If C-Blosc is returning a -1, then this is a bug; it should be a 0 (cannot compress instead of an internal error).
I agree with this analysis. Don't bother with an environment variable; just delete the print statement. Code can be stepped in a debugger if one are curious why it returns 0. I will do a pull request.
Regarding why HDF5 does not throw an error, it would be interesting to know why.
It seems the h5repack
tool does not configure printing of the HDF5 error stack unless the undocumented option -E
is given (see here and here).
If I add that option, I get the HDF5 error trace:
estan@edison:~$ h5import /dev/zero -d 1 -o /tmp/foo.hdf5 && h5repack -E -f UD=32001,0,0,0,0,4,1,2 /tmp/foo{,2}.hdf5
Output buffer size should be larger than 16 bytes
Output buffer size should be larger than 16 bytes
HDF5-DIAG: Error detected in HDF5 (1.10.4) thread 140479245659200:
#000: ../../../src/H5D.c line 338 in H5Dclose(): can't decrement count on dataset ID
major: Dataset
minor: Unable to decrement reference count
#001: ../../../src/H5I.c line 1364 in H5I_dec_app_ref_always_close(): can't decrement ID ref count
major: Object atom
minor: Unable to decrement reference count
.
.
.
#014: ../../../src/H5Z.c line 1367 in H5Z_pipeline(): filter returned failure
major: Data filters
minor: Write failed
#015: src/blosc_filter.c line 228 in blosc_filter(): Blosc compression error
major: Data filters
minor: Callback failed
H5tools-DIAG: Error detected in HDF5:tools (1.10.4) thread 140479245659200:
#000: ../../../../../tools/src/h5repack/h5repack_copy.c line 354 in copy_objects(): do_copy_objects from </tmp/foo.hdf5> could not copy data to </tmp/foo2.hdf5>
major: Failure in tools library
minor: error in function
#001: ../../../../../tools/src/h5repack/h5repack_copy.c line 1083 in do_copy_objects(): H5Dclose failed
major: Failure in tools library
minor: error in function
estan@edison:~$
And the exit status is 255 indicating failure:
estan@edison:~$ h5import /dev/zero -d 1 -o /tmp/foo.hdf5 && h5repack -f UD=32001,0,0,0,0,4,1,2 /tmp/foo{,2}.hdf5
Output buffer size should be larger than 16 bytes
Output buffer size should be larger than 16 bytes
estan@edison:~$ echo $?
255
estan@edison:~$
If one extends the test case a bit, putting a second dataset /foo
in the input file, one can see that this dataset is not present in the output file, since h5repack
"gave up":
estan@edison:~$ h5import /dev/zero -d 1 -o /tmp/foo.hdf5 && h5import /dev/zero -d 1 -p /foo -o /tmp/foo.hdf5 && h5repack -f UD=32001,0,0,0,0,4,1,2 /tmp/foo{,2}.hdf5
Output buffer size should be larger than 16 bytes
Output buffer size should be larger than 16 bytes
estan@edison:~$ h5dump /tmp/foo2.hdf5
HDF5 "/tmp/foo2.hdf5" {
GROUP "/" {
DATASET "dataset0" {
DATATYPE H5T_IEEE_F32LE
DATASPACE SIMPLE { ( 1 ) / ( 1 ) }
DATA {
(0): 0
}
}
}
}
estan@edison:~$
So HDF5 is reporting a failure, and h5repack is handling it. It's just a little silent about it.
So HDF5 is reporting a failure, and h5repack is handling it. It's just a little silent about it.
Arguably h5repack should handle this better, and not create an output file if the exit code is nonzero, instead of creating partial output.
Writing small arrays in hdf5 with blosc as the filter seems to work fine, but an error message is printed on stderr.
version info: