Closed brentmaas closed 8 months ago
Thanks for alerting us to this @brentmaas! I think adding a flag may indeed be a good solution here, and probably we can set it to use the latest hdf5lib by default.
Hi Brent, nice find, and thanks for reporting it!
I disagree about adding a flag. We don't know how many issues switching from an older to a newer library version will cause, but we also don't know how many issues switching from an older to a newer library version will solve (it may be more than just this one).
The solution to that is having a test suite with good coverage, and operating under the assumption that if it isn't being tested, it's broken anyway; therefore if a change doesn't break any tests (and fixes an issue for which you've just added a test) then it's made things better and should be applied. If a change increases the complexity of the software and the user interface (as adding a flag would do), then you need to consider whether that increase in complexity is worth the added functionality.
So, in this particular instance, I would just make the change. We have no evidence that doing so would break anything, and so no justification for the added complexity of adding a flag. If a user shows up whose script got broken, then they can temporarily downgrade to the old version until we implement an appropriate fix for their issue, be that a flag or something else.
Thanks @LourensVeen - you're probably right :). Let's make this change in a PR and see if anything breaks. @brentmaas since you reported it, would you want to create the PR?
Sure, I'll have a go at it!
Though I would say that passing additional kwargs from write_set_to_file
to h5py
doesn't seem like such a bad idea to me either... But perhaps that's not for this issue.
Also, it looks like this will break compatibility with HDF versions older than 1.10, which was released in March 2018, but only if the user writes their files using 1.10 or later first. That doesn't seem a very likely scenario.
No, that doesn't seem likely. Perhaps if we would write initial condition files on one computer (with recent HDF5) and then use these on a supercomputer which uses an older hdf5.
Ah yes, that could happen. Well, we'll see.
seems like there would be a workaround even for this case, e.g. by just writing the file with AMUSE format v1.
Is your feature request related to a problem? Please describe. I've noticed that long running scripts that iteratively write data using
amuse.io.write_set_to_file
tend to become noticably slower the further in the run they get, even though the workload in each iteration should take constant time. After some digging, I've found that this is a known problem with older HDF5 library versions in h5py. The solution in that Stackoverflow thread suggests to add the parameterlibver='latest'
to the instantiation ofh5py.File
to use a more recent version of HDF5 where this issue is fixed.This can be tested using a simple script:
On my unedited version of AMUSE, this will output the following:
However, if I edit
amuse.io.store_v2.py
to addlibver='latest'
on lines 743, 745 and 747 according to the Stackoverflow solution, I get the following:Which is a significant speed improvement. The output file is also almost twice as small: 60 MB in comparison to the 111 MB of the first run.
Important to note is the reason the Stackoverflow solution claims for this behaviour not being the default behaviour in h5py: compatibility. While an unedited
read_set_from_file
still seems to read both output files fine, I cannot predict any breaking changes.Describe the solution you'd like Given the clear positive effect, but unknown probability of breaking changes, I think adding a flag (e.g.
h5py_use_latest_libver
) towrite_set_to_file
would be the correct way to go, where the default (False) would keep the current behaviour and True would addlibver='latest'
to any underlyingh5py.File
instantiations. Alternatively, a parameter (e.g.h5py_libver
) could be added which directly passes its value tolibver
inh5py.File
so that other features oflibver
can also be used in case someone would want that.Additional context Operating system version: Linux 6.5.9-arch2-1 Compiler version: GCC 13.2.1 Python version: 3.11.5 AMUSE version: commit 6510b63f4e6fe26e7828d629a2c12e2dd60120f3 (24th of September, 2023); effectively latest for the entirety of
amuse.io
H5py version: 3.10.0 Numpy version: 1.25.2