deshaw / versioned-hdf5

Versioned HDF5 provides a versioned abstraction on top of h5py
https://deshaw.github.io/versioned-hdf5/
Other
75 stars 18 forks source link

Slicetools update breaks local tests #352

Open peytondmurray opened 4 days ago

peytondmurray commented 4 days ago

Tests are failing for me locally. Assuming that the problem was that I am using the latest hdf5 version, I tried using both the the stable version of hdf5 and hdf5-git from the AUR but both fail to pass most tests:

$ pytest . --import-mode=importlib versioned_hdf5/tests/test_api.py::test_stage_version
============================================================================================== test session starts ===============================================================================================
platform linux -- Python 3.12.3, pytest-8.2.2, pluggy-1.5.0
project deps: h5py-3.11.0, ndindex-1.8
rootdir: /home/pdmurray/Desktop/workspace/versioned-hdf5
configfile: pyproject.toml
plugins: hypothesis-6.104.2, env-1.1.3
collected 1 item                                                                                                                                                                                                 

versioned_hdf5/tests/test_api.py F                                                                                                                                                                         [100%]

==================================================================================================== FAILURES ====================================================================================================
_______________________________________________________________________________________________ test_stage_version _______________________________________________________________________________________________

vfile = <VersionedHDF5File object "/tmp/pytest-of-pdmurray/pytest-7/test_stage_version0/file.hdf5" (mode r+)>

    def test_stage_version(vfile):
        """Test that versions can be staged and are the expected shape."""
        test_data = np.concatenate(
            (
                np.ones((2 * DEFAULT_CHUNK_SIZE,)),
                2 * np.ones((DEFAULT_CHUNK_SIZE,)),
                3 * np.ones((DEFAULT_CHUNK_SIZE,)),
            )
        )

        # Chunk size is intelligently selected based on first dataset written
        chunk_size = guess_chunk(test_data.shape, None, test_data.dtype.itemsize)[0]
        with vfile.stage_version("version1", "") as group:
            group["test_data"] = test_data

        version1 = vfile["version1"]
        assert version1.attrs["prev_version"] == "__first_version__"
        assert_equal(version1["test_data"], test_data)

        ds = vfile.f["/_version_data/test_data/raw_data"]

        # The dataset has 3 different arrays of size chunk_size, so should have
        # shape 3*chunk_size
        assert ds.shape == (3 * chunk_size,)
        assert_equal(ds[0 : 1 * chunk_size], 1.0)
        assert_equal(ds[1 * chunk_size : 2 * chunk_size], 2.0)
        assert_equal(ds[2 * chunk_size : 3 * chunk_size], 3.0)

        with vfile.stage_version("version2", "version1") as group:
>           group["test_data"][0] = 0.0

versioned_hdf5/tests/test_api.py:58: 
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ 
versioned_hdf5/wrappers.py:1395: in __setitem__
    self.dataset.__setitem__(index, value)
h5py/_objects.pyx:54: in h5py._objects.with_phil.wrapper
    ???
h5py/_objects.pyx:55: in h5py._objects.with_phil.wrapper
    ???
versioned_hdf5/wrappers.py:1025: in __setitem__
    if c not in self.id.data_dict:
versioned_hdf5/wrappers.py:1446: in data_dict
    self._data_dict = build_data_dict(dcpl, self.raw_data.name)
slicetools.pyx:124: in versioned_hdf5.slicetools.build_data_dict
    ???
slicetools.pyx:140: in versioned_hdf5.slicetools.build_data_dict
    ???
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ 

>   ???
E   ValueError: Could not get virtual filename

slicetools.pyx:164: ValueError
---------------------------------------------------------------------------------------------- Captured stderr call ----------------------------------------------------------------------------------------------
HDF5-DIAG: Error detected in HDF5 (1.15.0):
  #000: /usr/src/debug/hdf5-git/hdf5/src/H5Pdcpl.c line 2397 in H5Pget_virtual_filename(): can't find object for ID
    major: Object ID
    minor: Unable to find ID information (already closed?)
  #001: /usr/src/debug/hdf5-git/hdf5/src/H5Pint.c line 4095 in H5P_object_verify(): property list is not a member of the class
    major: Property lists
    minor: Can't compare objects
  #002: /usr/src/debug/hdf5-git/hdf5/src/H5Pint.c line 4046 in H5P_isa_class(): not a property list
    major: Invalid arguments to routine
    minor: Inappropriate type
================================================================================================ warnings summary ================================================================================================
../../../.pyenv/versions/3.12.3/envs/vhdf/lib/python3.12/site-packages/_pytest/config/__init__.py:1448
  /home/pdmurray/.pyenv/versions/3.12.3/envs/vhdf/lib/python3.12/site-packages/_pytest/config/__init__.py:1448: PytestConfigWarning: Unknown config option: import_mode

    self._warn_or_fail_if_strict(f"Unknown config option: {key}\n")

-- Docs: https://docs.pytest.org/en/stable/how-to/capture-warnings.html
============================================================================================ short test summary info =============================================================================================
FAILED versioned_hdf5/tests/test_api.py::test_stage_version - ValueError: Could not get virtual filename
========================================================================================== 1 failed, 1 warning in 0.51s ==========================================================================================

OS: Linux 6.6.36-1-lts x86_64

hdf5 config

$ h5cc -showconfig
/usr/h5cc
dir is /usr
        SUMMARY OF THE HDF5 CONFIGURATION
        =================================

General Information:
-------------------
                   HDF5 Version: 1.15.0
                  Configured on: 2024-07-02
                  Configured by: Unix Makefiles
                    Host system: Linux-6.6.36-1-lts
              Uname information: Linux
                       Byte sex: little-endian
             Installation point: /usr

Compiling Options:
------------------
                     Build Mode: Release
              Debugging Symbols: OFF
                        Asserts: OFF
                      Profiling: OFF
             Optimization Level: OFF

Linking Options:
----------------
                      Libraries: 
  Statically Linked Executables: OFF
                        LDFLAGS: -Wl,-O1 -Wl,--sort-common -Wl,--as-needed -Wl,-z,relro -Wl,-z,now          -Wl,-z,pack-relative-relocs -flto=auto
                     H5_LDFLAGS: 
                     AM_LDFLAGS: 
                Extra libraries: m;dl
                       Archiver: /usr/bin/ar
                       AR_FLAGS:
                         Ranlib: /usr/bin/ranlib

Languages:
----------
                              C: YES
                     C Compiler: /usr/bin/cc 14.1.1
                       CPPFLAGS: 
                    H5_CPPFLAGS: 
                    AM_CPPFLAGS: 
                         CFLAGS:   -std=c99 -march=x86-64 -mtune=generic -O2 -pipe -fno-plt -fexceptions         -Wp,-D_FORTIFY_SOURCE=3 -Wformat -Werror=format-security         -fstack-clash-protection -fcf-protection         -fno-omit-frame-pointer -mno-omit-leaf-frame-pointer -g -ffile-prefix-map=/home/pdmurray/.cache/paru/clone/hdf5-git/src=/usr/src/debug/hdf5-git -flto=auto -fstdarg-opt -fdiagnostics-urls=never -fno-diagnostics-color -fmessage-length=0
                      H5_CFLAGS: -Wall;-Wcast-qual;-Wconversion;-Wextra;-Wfloat-equal;-Wformat=2;-Winit-self;-Winvalid-pch;-Wmissing-include-dirs;-Wshadow;-Wundef;-Wwrite-strings;-pedantic;-Wno-c++-compat;-Wbad-function-cast;-Wcast-align;-Wformat;-Wimplicit-function-declaration;-Wint-to-pointer-cast;-Wmissing-declarations;-Wmissing-prototypes;-Wnested-externs;-Wold-style-definition;-Wpacked;-Wpointer-sign;-Wpointer-to-int-cast;-Wredundant-decls;-Wstrict-prototypes;-Wswitch;-Wunused-but-set-variable;-Wunused-variable;-Wunused-function;-Wunused-parameter;-finline-functions;-Wno-aggregate-return;-Wno-inline;-Wno-missing-format-attribute;-Wno-missing-noreturn;-Wno-overlength-strings;-Wlarger-than=2560;-Wlogical-op;-Wframe-larger-than=16384;-Wpacked-bitfield-compat;-Wsync-nand;-Wno-unsuffixed-float-constants;-Wdouble-promotion;-Wtrampolines;-Wstack-usage=8192;-Wmaybe-uninitialized;-Wno-jump-misses-init;-Wstrict-overflow=2;-Wno-suggest-attribute=const;-Wno-suggest-attribute=noreturn;-Wno-suggest-attribute=pure;-Wno-suggest-attribute=format;-Wdate-time;-Warray-bounds=2;-Wc99-c11-compat;-Wincompatible-pointer-types;-Wint-conversion;-Wshadow;-Wduplicated-cond;-Whsa;-Wnormalized;-Wnull-dereference;-Wunused-const-variable;-Walloca;-Walloc-zero;-Wduplicated-branches;-Wformat-overflow=2;-Wformat-truncation=1;-Wrestrict;-Wattribute-alias;-Wshift-overflow=2;-Wcast-function-type;-Wmaybe-uninitialized;-Wcast-align=strict;-Wno-suggest-attribute=cold;-Wno-suggest-attribute=malloc;-Wattribute-alias=2;-Wmissing-profile;-Wc11-c2x-compat
                      AM_CFLAGS: 
               Shared C Library: YES
               Static C Library: YES

                        Fortran: OFF
               Fortran Compiler:  
                  Fortran Flags: 
               H5 Fortran Flags: 
               AM Fortran Flags: 
         Shared Fortran Library: YES
         Static Fortran Library: YES
               Module Directory: /home/pdmurray/.cache/paru/clone/hdf5-git/src/hdf5/_build/mod

                            C++: OFF
                   C++ Compiler:  
                      C++ Flags: 
                   H5 C++ Flags: 
                   AM C++ Flags: 
             Shared C++ Library: YES
             Static C++ Library: YES

                           JAVA: OFF
                  JAVA Compiler:  

Features:
---------
                     Parallel HDF5: OFF
  Parallel Filtered Dataset Writes: 
                Large Parallel I/O: 
                High-level library: ON
Dimension scales w/ new references: 
                  Build HDF5 Tests: ON
                  Build HDF5 Tools: ON
                   Build GIF Tools: OFF
                      Threadsafety: OFF
               Default API mapping: v116
    With deprecated public symbols: ON
            I/O filters (external):  DEFLATE
                  _Float16 support: ON
                     Map (H5M) API: OFF
                        Direct VFD: OFF
                        Mirror VFD: OFF
                     Subfiling VFD: OFF
                (Read-Only) S3 VFD: OFF
              (Read-Only) HDFS VFD: OFF
    Packages w/ extra debug output: 
                       API Tracing: OFF
              Using memory checker: OFF
                  Use file locking: best-effort
         Strict File Format Checks: OFF
      Optimization Instrumentation: 
ArvidJB commented 4 days ago

I think version 1.15 is still under development? The latest release is a 1.14.*: https://github.com/HDFGroup/hdf5/releases We're currently using 1.14.0.

  1. How would we specify a libhdf5 version <= 1.14 requirement?
  2. Can we add 1.15 support?
peytondmurray commented 4 days ago

Good question - I just tested with the latest release 1.14 too, but that also doesn't work for me. Maybe there's something else going on here :thinking:

How would we specify a libhdf5 version <= 1.14 requirement?

Good question, we should be able to specify version of our non-python dependencies in meson.build.

Can we add 1.15 support?

Definitely we will want to. I'll get to the bottom of this issue :point_up: first - that may make us 1.15-compatible anyway.

@ArvidJB Would you be able to post the output of h5cc -showconfig for your working installation?

ArvidJB commented 4 days ago

When I build versioned-hdf5 in a virtualenv I end up using the libhdf5 bundled with h5py:

>>> print(h5py.version.info)
Summary of the h5py configuration
---------------------------------

h5py    3.8.0
HDF5    1.14.0
Python  3.11.8 (main, Mar 15 2024, 12:37:54) [GCC 10.3.1 20210422 (Red Hat 10.3.1-1)]
sys.platform    linux
sys.maxsize     9223372036854775807
numpy   1.24.4
cython (built with) 0.29.36.1
numpy (built against) 1.23.2
HDF5 (built against) 1.14.0

Here's the output of h5pcc:

$ h5pcc -showconfig
            SUMMARY OF THE HDF5 CONFIGURATION
            =================================

General Information:
-------------------
                   HDF5 Version: 1.14.0
                  Configured on: Mon Sep 25 20:30:28 UTC 2023
                  Configured by: pyvenv@rhel8-exp-01.nyc.deshaw.com
                    Host system: x86_64-unknown-linux-gnu
              Uname information: Linux rhel8-exp-01.nyc.deshaw.com 4.18.0-477.21.1.el8_8.x86_64 #1 SMP Thu Jul 20 08:38:27 EDT 2023 x86_64 x86_64 x86_64 GNU/Linux
                       Byte sex: little-endian
             Installation point: /usr/local/python/python-3.11/std

Compiling Options:
------------------
                     Build Mode: production
              Debugging Symbols: no
                        Asserts: no
                      Profiling: no
             Optimization Level: high

Linking Options:
----------------
                      Libraries: static, shared
  Statically Linked Executables:
                        LDFLAGS:
                     H5_LDFLAGS:
                     AM_LDFLAGS:
                Extra libraries: -lz -ldl -lm
                       Archiver: ar
                       AR_FLAGS: cr
                         Ranlib: ranlib

Languages:
----------
                              C: yes
                     C Compiler: /usr/lib64/mpich/bin/mpicc ( MPICH version 3.4.2 Configured with: ../configure --enable-bootstrap --enable-languages=c,c++,fortran,lto --prefix=/opt/rh/gcc-toolset-10/root/usr --mandir=/opt/rh/gcc-toolset-10/root/usr/share/man --infodir=/opt/rh/gcc-toolset-10/root/usr/share/info --with-bugurl=http://bugzilla.redhat.com/bugzilla --enable-shared --enable-threads=posix --enable-checking=release --enable-multilib --with-system-zlib --enable-__cxa_atexit --disable-libunwind-exceptions --enable-gnu-unique-object --enable-linker-build-id --with-gcc-major-version-only --with-linker-hash-style=gnu --enable-plugin --enable-initfini-array --with-isl=/builddir/build/BUILD/gcc-10.3.1-20210422/obj-x86_64-redhat-linux/isl-install --disable-libmpx --enable-gnu-indirect-function --enable-cet --with-tune=generic --with-arch_32=x86-64 --build=x86_64-redhat-linux built with gcc version 10.3.1 20210422 (Red Hat 10.3.1-1) (GCC))
                       CPPFLAGS:
                    H5_CPPFLAGS: -D_GNU_SOURCE -D_POSIX_C_SOURCE=200809L   -DNDEBUG -UH5_DEBUG_API
                    AM_CPPFLAGS:  -I/tmp/pip-install-uxkpxbzu/h5py_6eb1e255f2924d96aa10e089412445f0/hdf5/src/H5FDsubfiling
                        C Flags:
                     H5 C Flags:  -std=c99  -Wall -Wcast-qual -Wconversion -Wextra -Wfloat-equal -Wformat=2 -Winit-self -Winvalid-pch -Wmissing-include-dirs -Wshadow -Wundef -Wwrite-strings -pedantic -Wno-c++-compat -Wlarger-than=2560 -Wlogical-op -Wframe-larger-than=16384 -Wpacked-bitfield-compat -Wsync-nand -Wno-unsuffixed-float-constants -Wdouble-promotion -Wtrampolines -Wstack-usage=8192 -Wmaybe-uninitialized -Wdate-time -Warray-bounds=2 -Wc99-c11-compat -Wduplicated-cond -Whsa -Wnormalized -Wnull-dereference -Wunused-const-variable -Walloca -Walloc-zero -Wduplicated-branches -Wformat-overflow=2 -Wformat-truncation=1 -Wattribute-alias -Wcast-align=strict -Wshift-overflow=2 -Wattribute-alias=2 -Wmissing-profile -Wc11-c2x-compat -fstdarg-opt -fdiagnostics-urls=never -fno-diagnostics-color -s  -Wbad-function-cast -Wcast-align -Wformat -Wimplicit-function-declaration -Wint-to-pointer-cast -Wmissing-declarations -Wmissing-prototypes -Wnested-externs -Wold-style-definition -Wpacked -Wpointer-sign -Wpointer-to-int-cast -Wredundant-decls -Wstrict-prototypes -Wswitch -Wunused-but-set-variable -Wunused-variable -Wunused-function -Wunused-parameter -Wincompatible-pointer-types -Wint-conversion -Wshadow -Wrestrict -Wcast-function-type -Wmaybe-uninitialized -Wno-aggregate-return -Wno-inline -Wno-missing-format-attribute -Wno-missing-noreturn -Wno-overlength-strings -Wno-jump-misses-init -Wstrict-overflow=2 -Wno-suggest-attribute=const -Wno-suggest-attribute=noreturn -Wno-suggest-attribute=pure -Wno-suggest-attribute=format -Wno-suggest-attribute=cold -Wno-suggest-attribute=malloc -O3
                     AM C Flags:
               Shared C Library: yes
               Static C Library: yes

                        Fortran: no

                            C++: no

                           Java: no

Features:
---------
                     Parallel HDF5: yes
  Parallel Filtered Dataset Writes: yes
                Large Parallel I/O: yes
                High-level library: yes
Dimension scales w/ new references: no
                  Build HDF5 Tests: yes
                  Build HDF5 Tools: yes
                   Build GIF Tools: no
                      Threadsafety: no
               Default API mapping: v114
    With deprecated public symbols: yes
            I/O filters (external): deflate(zlib)
                     Map (H5M) API: no
                        Direct VFD: no
                        Mirror VFD: no
                     Subfiling VFD: no
                (Read-Only) S3 VFD: no
              (Read-Only) HDFS VFD: no
    Packages w/ extra debug output: none
                       API tracing: no
              Using memory checker: no
            Function stack tracing: no
                  Use file locking: best-effort
         Strict file format checks: no
      Optimization instrumentation: no
ArvidJB commented 4 days ago

PS: the code I wrote is basically just a slightly adapted version of the code in h5py: https://github.com/h5py/h5py/blob/44ba349e5ecd21ecbc75b1314063e96d30aec872/h5py/_hl/dataset.py#L1137 and https://github.com/h5py/h5py/blob/44ba349e5ecd21ecbc75b1314063e96d30aec872/h5py/h5p.pyx#L936