Open shaomeng opened 9 months ago
I can confirm that h5repack gives this error (with 8-bit word size, as needed). I've not patched my h5repack, but presumably one shouldn't have to with recent versions of HDF5--I, too, have 1.14.3 installed.
@markcmiller86 Any ideas?
The patch is only necessary for 1.8. Probably something wrong with the testing logic itself. I will take a look but not this week.
So, after a quick look, I don't see anything wrong with testing logic. The test is intended to h5repack
the test file, mesh.h5
producing mesh_repack.h5
which should be smaller than the original by at least 20%. The fact that h5repack
itself is reporting an issue suggests that it doesn't like a datatype its encountered in mesh.h5
. But, all previous versions of HDF5 have been fine with that file. What does h5dump mesh.h5
or h5ls -vlrd mesh.h5
produce on either your system, @lindstro or your system @shaomeng using the HDF5 1.14.3 you have installed there?
Hi Mark, I have the following output from h5dump mesh.h5
(attached txt file).
mesh.h5.txt
So, something has me confused but in test/Makefile
here...
there is a redirection of both stdout
and stderr
with 2>&1 1>/dev/null
which means we should NOT SEE any HDF5 error logging anywhere. But, we are. Why? It might be worth removing that rediction and doing make test
again and see what else it indicates is going wrong.
Next, it appears to complete the h5repack
with 0
return value because it is NOT exiting the test here...
Instead, it is exiting at the check of the size ratio. So, I think h5repack
has encountered some unknown issue with the datatypes in mesh.h5
and refused to compress the file. You could confirm by looking for an mesh_repack.h5
and see if it is smaller (or different using h5diff
) than mesh.h5
.
It produces the kind of output you might expect, with no warnings or errors. Output attached.
Could it be that H5T_STD_U8LE
and H5T_STD_I8LE
are not handled? zfp supports only int32
, int64
, float
, and double
.
Here's part of the output I'm seeing:
h5repack -n -f UD=32013,0,4,3,0,3539053052,1062232653 ...HDF5-DIAG: Error detected in HDF5 (1.14.3) thread 0:
#000: /tmp/hdf5-20240118-4051-ziphrl/hdf5-1.14.3/src/H5T.c line 2052 in H5Tget_class(): not a datatype
major: Invalid arguments to routine
minor: Inappropriate type
...
[FAILED (size ratio) ]
Hmm...I think you could be right. In the mesh.h5
most of the datasets are int, float or double. However, there are a few there which are not...
[scratlantis:zfp_filter/H5Z-ZFP/test] miller86% h5ls -vlr mesh.h5 | grep -B 4 signed
/Pressure Dataset {10/10, 20/20, 30/30}
Location: 1:42412
Links: 1
Storage: 24000 logical bytes, 24000 allocated bytes, 100.00% utilization
Type: native unsigned int
--
/Pressure_2D Dataset {20/20, 30/30}
Location: 1:1400
Links: 1
Storage: 600 logical bytes, 600 allocated bytes, 100.00% utilization
Type: native unsigned char
--
/VelocityX_2D Dataset {21/21, 31/31}
Location: 1:1672
Links: 1
Storage: 651 logical bytes, 651 allocated bytes, 100.00% utilization
Type: native signed char
What happens for you if you adjust the Makefile to read...
-l X,Y,Z,Indexes:CHUNK=217 \
-l Indexes2:CHUNK=1517 \
-l Pressure2,Pressure3:CHUNK=10x20x5 \
-l Stress,Velocity,Stress2,Velocity2,Stress3,Velocity3,VelocityZ,VelocityZ2,VelocityZ3:CHUNK=11x21x1x1 \
-l XY:CHUNK=651x1 \
-l XYZ:CHUNK=217x1 \
-l XYZ2:CHUNK=1617x1 \
-l XYZ3:CHUNK=33x1 \
which removes the request to repack Pressure
, Pressure_2D
and VelocityX_2D
?
Well, I think its just VelocityX_2D
and Pressure_2D
that are the culprits. Interesting thing...earlier versons of HDF5 I think were silently skipping these or maybe h5repack
continued past them in earlier versions of HDF5 but now maybe stops upon encountering the error?
What happens for you if you adjust the Makefile to read...
Same errors as far as I can tell.
FWIW, the test passes if I leave out HDF5_PLUGIN_PATH=../src/plugin
. Not sure what HDF5 does in this case when requesting zfp compression.
So, I just tried with master
on HDF5-1.14.3 with gcc on macOS and do not see this error. I did a make clean
first.
FWIW, the test passes if I leave out
HDF5_PLUGIN_PATH=../src/plugin
. Not sure what HDF5 does in this case when requesting zfp compression.
It should fail the size ratio if you do that. Mine does....
Plugin precision tests ...................................... [PASSED]
If using HDF5-1.8, make sure you have patched repack
h5repack -n -f UD=32013,0,4,3,0,3539053052,1062232653 ... [FAILED (size ratio) ]
./test_write_plugin zfpmode=5 ............................... [PASSED]
h5diff -v -d 0.00001 test_zfp_le.h5 test_zfp_be.h5 compressed compressed ........ [PASSED]
h5dump bigendian.h5 ............................................................. [PASSED]
I wonder if it is somehow finding an older version of the plugin filter code somewhere in your path, ld lib var or something?
BTW...I am working from H5Z-ZFP master
branch and ZFP develop
branch.
So, I just tried with
master
on HDF5-1.14.3 with gcc on macOS and do not see this error. I did amake clean
first.
I tried make clean
and also the remove of the request to repack Pressure
, Pressure_2D
and VelocityX_2D
, it's still giving the same error.
In my case, I'm pretty sure that this is the only version of H5Z-ZFP I have; I just started to interact with this piece of software a few days ago ;) (And yes, I'm also using H5Z-ZFP master
, ZFP develop
.) I'm having this problem on ubuntu 22.04 though.
Some potential differences: I'm building H5Z-ZFP using GNU make and AppleClang 15.0.0. I'm using the master
branch of both H5Z-ZFP and zfp (I had some vague recollection that there was a weird interaction between H5Z-ZFP and the zfp develop
branch, but maybe that was pre 1.0.0).
I wonder if it is somehow finding an older version of the plugin filter code somewhere in your path, ld lib var or something?
LD_LIBRARY_PATH
is not set, and I can confirm using DYLD_PRINT_LIBRARIES
the correct path to the plugin.
@lindstro or @shaomeng when you get a chance, can you attach a copy of mesh_repack.h5
you get (assuming you do indeed get one...I think you are because its failing the size ratio check, not the first step in performing the repack
)?
@lindstro or @shaomeng when you get a chance, can you attach a copy of
mesh_repack.h5
you get (assuming you do indeed get one...I think you are because its failing the size ratio check, not the first step in performing therepack
)?
Hi Mark, I do get one! Please see the attachment mesh_repack.h5.tar.gz
So my mesh_repack.h5 does not contain any zfp magic numbers (0x0570667a
in little endian), suggesting that compression is not even being invoked.
So my mesh_repack.h5 does not contain any zfp magic numbers (
0x0570667a
in little endian), suggesting that compression is not even being invoked.
Well, is the mesh_repack.h5
you are looking at one you generated by disabling setting of HDF5_PLUGIN_PATH
as you mentioned above
Hi Mark, I do get one! Please see the attachment mesh_repack.h5.tar.gz
Thanks. That file appears to be an exact copy of mesh.h5
with no datasets compressed. So, the repack
operation produced an output but no compression was employed. I don't think h5repack
will return non-zero return value in this case. I will have to check. But, pretty sure it returned 0 implying the operation succeeded. So, it went on to compare the sizes of the files and then failed the size ratio as @shaomeng original log shows (e.g. [FAILED (size ratio) ]
).
So, the question is, why isn't it applying the compression? The warning messages about not a data type are certainly a clue. I am still trying to get to bottom of it.
@lindstro is your mac Intel or Apple chips?
I am thinking difference in behavior may have to do with how we have respectively compiled HDF5 1.14.3. Can each of you also past your CMake or autoconf command-line for how you configured HDF5?
Well, is the mesh_repack.h5 you are looking at one you generated by disabling setting of HDF5_PLUGIN_PATH as you mentioned https://github.com/LLNL/H5Z-ZFP/issues/137#issuecomment-1967545327
No, the path is correct and the plugin is loaded:
dyld[60726]: <FBD8E492-073B-3FDF-A6D2-3FE5D0E53BEA> /.../H5Z-ZFP/src/plugin/libh5zzfp.so
dyld[60726]: <1B084037-309A-3541-8B17-20022829C01E> /opt/homebrew/Cellar/hdf5/1.14.3/lib/libhdf5.310.3.0.dylib
dyld[60726]: <BC330C52-DB91-346D-830B-514BDC7F1043> /.../H5Z-ZFP/zfp/lib/libzfp.1.0.1.dylib
@lindstro is your mac Intel or Apple chips?
I have an Apple M2 Max.
I am thinking difference in behavior may have to do with how we have respectively compiled HDF5 1.14.3. Can each of you also past your CMake or autoconf command-line for how you configured HDF5?
I couldn't get CMake to override the path to zfp (I tried ZFP_DIR
); I don't know how it's finding zfp under my home directory. So I used GNU make:
make ZFP_HOME=`pwd`/zfp HDF5_HOME=/opt/homebrew/Cellar/hdf5/1.14.3 HDF5_PLUGIN_PATH=`pwd`/src/plugin all
@jhendersonHDF wanted to ask if you could have a looksee here briefly and see if anything looks familiar. It kinda sounds like this HDF forum issue but we're not seg-faulting. Its just like h5repack
is deciding it cannot invoke the requested filtering operations due to some issue in recognizing the datatype.
For reference, here is the command that works for me but fails for the other two users here...
env \
LD_LIBRARY_PATH=/Users/miller86/silo/hdf5-1.14.3/build_serial/myinstall/lib: \
HDF5_PLUGIN_PATH=../src/plugin
/Users/miller86/silo/hdf5-1.14.3/build_serial/myinstall/bin/h5repack \
-n -f UD=32013,1,4,3,0,3539053052,1062232653 \
-l X,Y,Z,Indexes:CHUNK=217 \
-l Indexes2:CHUNK=1517 \
-l Pressure,Pressure2,Pressure3:CHUNK=10x20x5 \
-l Pressure_2D:CHUNK=10x20 \
-l Stress,Velocity,Stress2,Velocity2,Stress3,Velocity3,VelocityZ,VelocityZ2,VelocityZ3:CHUNK=11x21x1x1 \
-l VelocityX_2D:CHUNK=21x31 \
-l XY:CHUNK=651x1 \
-l XYZ:CHUNK=217x1 \
-l XYZ2:CHUNK=1617x1 \
-l XYZ3:CHUNK=33x1 \
mesh.h5 mesh_repack.h5
@markcmiller86 Is it possible that you built a shared library only version of HDF5 while others here have built either a static only version or a static + shared version? This looks suspiciously similar to problems I've seen in the past where using an HDF5 tool which is statically linked to HDF5 along with a dynamically loaded plugin that is linked to the shared version of the library causes two HDF5s to be loaded into memory and results in very odd issues. If an HDF5 build was built with both static and shared, there will be two versions of each tool, e.g. h5dump
and h5dump-shared
and the shared versions are the correct ones to use in this case. This has been a problem often enough that we're finally correcting this in https://github.com/HDFGroup/hdf5/pull/4046 by only ever building one set of tools and linking to the shared library by default if available.
@jhendersonHDF thanks for the quick look. I do indeed have shared and static build.
otool -L ~/silo/hdf5-1.14.3/build_serial/myinstall/bin/h5repack
/Users/miller86/silo/hdf5-1.14.3/build_serial/myinstall/bin/h5repack:
/Users/miller86/silo/hdf5-1.14.3/build_serial/myinstall/lib/libhdf5.310.dylib (compatibility version 314.0.0, current version 314.0.0)
/usr/lib/libz.1.dylib (compatibility version 1.0.0, current version 1.2.11)
/usr/lib/libSystem.B.dylib (compatibility version 1.0.0, current version 1311.100.3)
I am thinking difference in behavior may have to do with how we have respectively compiled HDF5 1.14.3. Can each of you also past your CMake or autoconf command-line for how you configured HDF5?
Hi Mark, I followed this instruction to build HDF5, and I didn't change anything besides the installation prefix. I do see that I have both h5repack
and h5repack-shared
in the installed bin
directory, so it might be the issue??
@lindstro I have intel Mac but I do have an M1 I can try to test on. Also, my 1.0.0 zfp lib by default builds static, libzfp.a
. When H5Z-ZFP is built, it pulls that statically into the .so
file that gets created for the HDF5 filter/plugin. Thats how it is "finding" zfp. Its statically built into the plugin.
@markcmiller86 Interesting that your h5repack
seems to be linking to the shared library version of HDF5 if you have a static + shared build; I would've expected it to be linking to the static library. But since your version is linking to the shared library, that's likely why you have no issues. For others here, I would try editing what's necessary to make the tests use h5repack-shared
instead and see if that works. By default, HDF5's CMake should build both static and shared libs and you'll end up with this problem since the expected name for the tool (h5repack
, for instance) actually points to the statically-linked tool.
I used homebrew to install HDF5.
otool -L `where h5repack`
/opt/homebrew/bin/h5repack:
/usr/lib/libSystem.B.dylib (compatibility version 1.0.0, current version 1336.61.1)
/usr/lib/libz.1.dylib (compatibility version 1.0.0, current version 1.2.12)
/opt/homebrew/opt/libaec/lib/libsz.2.dylib (compatibility version 2.0.0, current version 2.0.1)
Evidently it was linked to the static library.
Evidently it was linked to the static library.
Ok, so thats a problem. A statically linked tool cannot load a plugin. So, the test cannot be properly executed. So, we should detect that and just fail associated tests.
This must be the explanation for failed test here.
So, we should detect that and just fail associated tests.
You might also be able to detect if the "-shared" version of a tool exists and use that if available.
So, we should detect that and just fail associated tests.
You might also be able to detect if the "-shared" version of a tool exists and use that if available.
Indeed, the test passes with h5repack-shared
.
@shaomeng see if this branch, mcm86-28feb24-static-hdf5-failures
works for you.
@jhendersonHDF thanks for your attention and time...probably saved me a few hours 💪🏻
@shaomeng see if this branch,
mcm86-28feb24-static-hdf5-failures
works for you.
Hi Mark, the new branch still fails, with the same errors.
If I manually change the command in make check
from h5repack
to h5repack-shared
then the tests pass.
@markcmiller86 You probably need to check if the static lib is present rather than checking if the shared lib is missing. If static is present at all then the regular named tools should end up linked statically. If the static lib is present you should either skip the test like in that PR or prefer the -shared tool when the static lib is present and the regular tool when it's not.
That is, I guess you should aim for HDF5_IS_SHARED_ONLY rather than HDF5_IS_STATIC_ONLY
Ok, thanks for info @jhendersonHDF and @shaomeng. I will return to this next week. Other fires to put out now.
Something else that really troubles me here is that many of the tests use HDF5 tools (h5diff
, h5dump
, and h5repack
) to confirm tests passed or failed. So, IMHO, if the test involving h5repack
failed for @shaomeng because it was a statically linked executable, then many of the other tests should have failed for the same reason. Yet, his .log
file shows them all passing. I worry that somehow the testing logic is getting defeated...that those tests also should have failed but the detection logic isn't catching that.
Ok, yeah, I am seeing some issues with test logic.
h5dump
mostly just check dataset metadata and do not actually require the filter to do their work. Nonetheless, we specify HDF5_PLUGIN_PATH
env. variable for those tests. That is unnecessary and confusing. So, those can indeed pass even when it is a statically linked executable and that is ok. However, there is one h5dump
-based test that is designed to check endianness handling and the pass/fail logic for that test is not robust enough.test-accuracy
make target (which uses H5Z-ZFP as a plugin with ZFP in accuracy mode) and the test-lib-accuracy
make target (which uses H5Z-ZFP as a lib with ZFP in accuracy mode) which are basically testing the exact same cases except using H5Z-ZFP a plugin vs. lib, wind up doing the detection for incorrect results differently. The plugin tests use h5diff
whereas the lib tests read the data back in with the test_read_lib
which likely uses different arithmetic to test accuracy.--disable-shared
, there is no way any of the testing features using H5Z-ZFP as a plugin can work. Only the lib based tests can work. The reverse situation (--disable-static
) does not have this constraint. It still links H5Z-ZFP statically for the lib tests but winds up using a shared HDF5 lib for them and of course the plugin tests operate fine.--disable-shared
then HDF5 lib might be only shared or both static and shared. If its both, the shared lib version of the tools have -shared
appended to their names.
-shared
in their names.@markcmiller86 I mostly use CMake these days, but as far as I can tell the default in Autotools is to always build both shared and static (same as CMake), but link to the shared library by default if available. I believe the "-shared" tools concept has always been unique to the CMake builds, which complicates things further. When https://github.com/HDFGroup/hdf5/pull/4046 is merged, CMake and Autotools should have identical behavior, but that doesn't really help the current situation.
I was following the installation steps to compile HDF5, ZFP, and H5Z-ZFP on my system. However, the
make check
fails with a datatype error:(Note: all other tests succeed except for this one. The full
make check
output is attached: h5z-zfp.log).I have HDF5 version 1.14.3, ZFP from the main branch as of 2/24/2024, and H5Z-ZFP from the main branch as of 2/24/2024. I'm not sure if the failed checks are consequential or not.