DiamondLightSource / durin

BSD 3-Clause "New" or "Revised" License
2 stars 5 forks source link

"requirements" should document that libh5bshuf.so is required for certain HDF5 files #27

Open KayDiederichs opened 2 years ago

KayDiederichs commented 2 years ago

it took me some time to realize that for some data sets, libh5bshuf.so must be present in /usr/local/hdf5/lib/plugin , or in the directory pointed at by the environment variable HDF5_PLUGIN_PATH . Could this pls be documented? I am trying to compile bitshuffle-0.4.2 on M1 Apple but I am having a hard time; seems like it wants to compile for x86_64 Apple ... if somebody could prepare such a library then durin could be used natively on M1 Apple.

graeme-winter commented 2 years ago

@KayDiederichs thanks for flagging this - it should not be needed - should be compiled in. I now have an M1 mac so will push back up the to-do

Also see

https://github.com/DiamondLightSource/durin/commit/5d0b7bd104c50e7390e686cf08c6116e12f228f9

does this extra line make it work for you? I am also aware @ndevenish has a cmake build coming - https://github.com/DiamondLightSource/durin/pull/26

(this repo could do with some attention)

KayDiederichs commented 2 years ago

Thanks, Graeme. I tried the extra -noshlib option on x86_64 Linux but it makes no difference - /usr/local/hdf5/lib/plugin/libh5bshuf.so is still needed . Now that you mention that it is already compiled in, I don't understand why this does not work - but maybe the filter plugin mechanism of HDF5 does not expect a compiled-in filter.

ndevenish commented 2 years ago

I’ll try to have a look at this today. Are there [XDS] public M1 Mac builds yet? @graeme-winter i believe you have a test copy you can forward me otherwise?

graeme-winter commented 2 years ago

Appears to already be online at https://xds.mr.mpg.de/html_doc/downloading.html

ndevenish commented 2 years ago

Ahh, Iooked there but was being dumb and read "(emulated on apple silicon)" and just stopped.

ndevenish commented 2 years ago

@KayDiederichs, I can't reproduce this with our datasets. The reason that we don't believe the plugin should normally be required appears to be that - if it can - durin reads the data chunks directly and uses internal bitshuffle to manually decompress.

That appears to be controlled here: https://github.com/DiamondLightSource/durin/blob/5d0b7bd104c50e7390e686cf08c6116e12f228f9/src/file.c#L878

which appears to be gated behind a check for file structure layout:

  if (H5Lexists(visit_result->nxdetector, "data_000001", H5P_DEFAULT) > 0) {
    ds_prop_func = &get_dectris_eiger_dataset_dims;
  } else if (H5Lexists(visit_result->nxdetector, "data", H5P_DEFAULT) > 0) {
    ds_prop_func = &get_nxs_dataset_dims;
  } else if (H5Lexists(visit_result->nxdata, "data_000001", H5P_DEFAULT) > 0) {
    ds_prop_func = &get_dectris_eiger_dataset_dims;
  } else if (H5Lexists(visit_result->nxdata, "data", H5P_DEFAULT) > 0) {
    ds_prop_func = &get_nxs_dataset_dims;
  } else {
    ERROR_JUMP(-1, done, "Could not locate detector dataset");
  }

so - I guess if you have an h5 file that only has a /entry/data or not a data_000001, then it looks like it doesn't use the direct chunk read, and thus the HDF5 normal image handling, which requires a plugin.

@graeme-winter, does this sound about right?

KayDiederichs commented 2 years ago

thanks for the explanation! Wolfgang and I have been looking at a dataset consisting of xxx_master.h5 and xxx_data_000005.h5 . Durin worked well on my computers (which happen to have a long-forgotten /usr/local/hdf5/lib/plugin/libh5bshuf.so from 2015) but not on his (which don't). Since it took us a few days to realize that this difference is responsible for the failure, it would be good to document it. BTW after reading Nick's message I symlinked xxx_data_000005.h5 to xxx_data_000001.h5 and this did change the behavior of durin, but it failed nevertheless with a different error message, so this trick does not work.

fleon-psi commented 2 years ago

Quick suggestion - to avoid providing separate bshuf filter: a) include bshuf_h5filter.c/bshuf_h5filter.h from bitshuffle source b) add H5Zregister call in durin itself, see example in bshuf_h5filter.c Then you will be able to use bitsuffle via HDF5 builtin filter mechanism.

ndevenish commented 2 years ago

Ah, I think I was both unclear, and slightly misunderstood.

I was referring to the internal structure of the HDF5 file:

$ h5dump -n ins_6_1_master.h5
HDF5 "ins_6_1_master.h5" {
FILE_CONTENTS {
 group      /
 group      /entry
 group      /entry/data
 dataset    /entry/data/data
 ext link   /entry/data/data_000001 -> ins_6_1_000001.h5 /data
 ext link   /entry/data/data_000002 -> ins_6_1_000002.h5 /data
....

I think I thought that /entry/data/data_0000001 was the standard Nexus way, because that's what all of ours do. It looks like that is a DLS implementation detail - that we're checking directly in Durin.

Quick suggestion - to avoid providing separate bshuf filter: a) include bshuf_h5filter.c/bshuf_h5filter.h from bitshuffle source b) add H5Zregister call in durin itself, see example in bshuf_h5filter.c Then you will be able to use bitsuffle via HDF5 builtin filter mechanism.

Hmm, this sounds a lot less work than I had anticipated (I thought it would be harder), and it would be nice to just resolve the problem without having to have the filters set up. (alternatively, I think that hdf5plugin (github, anaconda) does a lot of the work to get the plugin set compiling on most regular platforms, so pulling their binaries might work).

KayDiederichs commented 2 years ago

Apple M1 is not a regular platform, and I could not find a libh5bshuf.so for it (and then it must also cooperate with the gcc-12 durin rather than with something compiled with Apple's CLANG compiler). See also issue #24

ndevenish commented 2 years ago

Well, it's a regular platform in conda-forge terms - and https://anaconda.org/conda-forge/hdf5plugin/files has osx-arm64 builds. (well, it's almost a regular platform, it's cross-compiled but works well enough for DIALS). All the conda-forge stuff is, however, compiled with (non-apple) clang, if it is an issue. FWIW on my M1, durin built with the current durin-main-branch Makefile works, compiled using h5cc from conda-forge which is built using clang.

I'm not suggesting any of this as a good solution, but if the option is struggling to manually build or trying to use a prebuilt binary, then it seems better than not being able to analyse data.

We should definitely try to handle this (common case) better though, here.

KayDiederichs commented 2 years ago

Thanks for pointing to that URL! My google fu didn't find that. I downloaded libh5bshuf.dylib and it works (also) for me.

graeme-winter commented 2 years ago

Quick suggestion - to avoid providing separate bshuf filter: a) include bshuf_h5filter.c/bshuf_h5filter.h from bitshuffle source b) add H5Zregister call in durin itself, see example in bshuf_h5filter.c Then you will be able to use bitsuffle via HDF5 builtin filter mechanism.

@fleon-psi - yes we were discussing this earlier - though as @ndevenish pointed out there are also some "routing" questions between how we deal with Diamond data and more "native" (e.g. DECTRIS file writer) data.