bnprks / BPCells

Scaling Single Cell Analysis to Millions of Cells
https://bnprks.github.io/BPCells
Other
166 stars 17 forks source link

enable setting HDF5_CFLAGS and HDF5_LIBS external to ./configure #124

Closed douglasgscofield closed 1 month ago

douglasgscofield commented 2 months ago

The local ./configure for BPCells does not have a mechanism for setting HDF5_CFLAGS and HDF5_LIBS external to the script, as do other ./configure scripts; it relies only on its own internal mechanisms for determining the flags for interacting with hdf5. We do not provide hdf5 in the standard locations, but rather provide it from an Lmod module, so the include files and libraries for hdf5 are not in standard system locations..

This just uses a little bashiness to set the compilation and link flag variables to current environment variables if these are provided, otherwise it uses the current defaults. With these changes, compilation against an hdf5 in a nonstandard location works correctly with

HDF5_CFLAGS="-I${HDF5_INCLUDE}" HDF5_LIBS="-L${HDF5_LIB} -lhdf5" R CMD INSTALL BPCells_0.2.0.tar.gz

assuming HDF5_INCLUDE is set to the hdf5 .../include directory and HDF5_LIB is set to its .../lib directory. Any other mechanism that can provide settings for these variables can be used to set these variables within the ./configure.

This can address issues in #86 and #84.

bnprks commented 2 months ago

Hi @douglasgscofield, thanks for taking a look at this - getting HDF5 to link properly can definitely be a challenge on some environments. Could you let me know a bit more about what the circumstances are of your cluster that require these changes as opposed to using the existing config options?

(in order attempted, these are the R Makevars/R CMD config options; pkg-config options; h5cc autodetect; then conda autodetect. These try to start with the most standard approach, then fall back on less standard options if needed)

In your installation setup, will an R user always have the same HDF5 version (e.g. if the R lmod module depends on HDF5)? If so, simply adjusting the per-user or system-wide Makevars should be possible, as I described in my comment on #86. Due to some sloppiness on my part, I think it currently requires setting both CFLAGS and CXXFLAGS (and likely LDFLAGS).

Presumably your hdf5 lmod setup also does not set up pkg-config .pc files, or else the config would also be working already.

I can see how being able to set environment variables could make the config easier, though I'd have a slight preference for piggybacking on the more-standard CFLAGS, CXXFLAGS, LDFLAGS (or LIBRARY_PATH) variables rather than having the user set HDF5_* variables specific to BPCells only.

douglasgscofield commented 2 months ago

Our build of hdf5/1.14.0 is from source, and that did not provide pkg-config support.

The pull request is meant to be in the spirit of autotools' ./configure scripts, which have options like --with-hdf5=$HDF5_PREFIX or cmake's -D variables for library and include file locations, both of which have cascading effects onto CFLAGS/CXXFLAGS/LDFLAGS. If an option would be preferable to external environment variables that could certainly be done.

The hdf5 against which BPCells is built should be selectable at build time, since there is no base dependency of R on hdf5 and BPCells does not use rhdf5 or another HDF5-facing R package that would introduce its own dependency. Meaning that the version built against is determined by our loaded module and the options to ./configure as it is when we build against any other library. Different users should be able to build against different hdf5 versions so long as their APIs are compatible with BPCell's expectations. RPATH info is sufficient (with -Loptions) to find the appropriate libhdf5.so at runtime. Unfortunately things like CPATH and LD_RUN_PATH specified by our hdf5 module (and other library modules) seem rarely to be honoured during ./configure and cmake runs, hence the necessity of library-specific options for locating these dependencies.

What is specifically happening with BPCells' ./configure is that the system-installed hdf5 (1.10.x) is being found during the first test compilation, resulting in only -lhdf5 being added to LDFLAGS, but we want to build against the newer module hdf5/1.14.0. This situation often occurs when clusters are running older OSs but we want to use newer libraries. My ./configureand cmake lines can get quite long specifying dependencies.

bnprks commented 2 months ago

Okay, that makes sense for why the environment variable would be preferred then. I take it that you are in the position of an admin who would have the ability to make the hdf5 modules modify some arbitrary environment variables in order to make BPCells installation run more smoothly across the cluster?

Environment variable seems better for this than adding a command line option, as any users installing via remotes::install_github will more easily be able to set environment variables to alter the compilation.

My ideal case would be if the BPCells installation naturally respected CPATH and LIBRARY_PATH environment variables. From some quick testing it appears that R CMD install works with CPATH already, but LIBRARY_PATH didn't obviously do anything.

As a second choice, I'd probably prefer taking CFLAGS and LDFLAGS from the environment, so that we wouldn't have to add separate HWY_CFLAGS or EIGEN_CFLAGS for all other C/C++-level dependencies. Something along the lines of:

CFLAGS="$CFLAGS $("${R_HOME}/bin/R" CMD config CFLAGS)"
LDFLAGS="$LDFLAGS $("${R_HOME}/bin/R" CMD config LDFLAGS)"

but with some modification to make things propagate correctly to Makevars.in via that final sed command

I am on vacation for the next 10 days and won't be able to do much more checking here until then, but I will follow back up then

bnprks commented 2 months ago

Hi @douglasgscofield, I'm back now from vacation.

I'm currently leaning away from using your suggestion of reading HDF5_CFLAGS and HDF5_LIBS as environment variables, so I think you're off the hook for having to make code adjustments to satisfy my preferences.

That said, I'd still be interested adjusting the configure script myself to address your use-case, perhaps with the solution I described above (reading CFLAGS and LDFLAGS from both the environment and R CMD config). I'd want to confirm with you that it would actually address your issue, though, before making the changes.

Let me know if that sounds like a reasonable plan

-Ben

douglasgscofield commented 2 months ago

Does that mean that you would rewrite the ./configure script to no longer use separate HDF5 variables, it would all be run through CFLAGS and LDFLAGS? That would allow a general mechanism for specifying options for dependencies and it seems like it would work for me. So if you choose to go ahead with that, that seems reasonable.

Though I wonder if you are losing some specificity that would be helpful? Maybe reducing it to a single variable? For example, CMake policy CMP0074 being true has it utilise a variable like HDF5_ROOT if it finds it, treating it like a prefix of an install tree, so if ./configure finds HDF5_ROOT defined (and similarly for any other dependencies) and that location passes sanity checks, then -I${HDF5_ROOT}/include is added to CFLAGS and -L${HDF5_ROOT}/lib -lhdf5 to LDFLAGS. This is also analogous to a classic autotools option like ./configure --with-hdf5=$HDF5_ROOT ...

Just an alternative that might be simpler, but I'm fine either way.

bnprks commented 2 months ago

I'd probably keep most of the variables as-is in the configure script, mostly just add more flexibility to the user-facing configuration options. As a user, this would mean that in addition to setting CFLAGS and LDFLAGS in the standard ~/.R/Makevars file, one could choose to use environment variables to provide flags to the compiler or linker respectively. BPCells would concatenate the flags read from both locations, and use those flags during all test compilation steps. (probably putting the environment variables second so they take priority)

My main concern here is keeping the user configuration options as simple to explain as possible, so I prefer allowing the user to directly set compiler flags rather than a more indirect method like specifying HDF5_ROOT.

If you don't mind me pushing some commits onto your personal main branch I can make my suggested edits in this PR, otherwise I can make a new demo branch to make sure it works for you before merging into main.

douglasgscofield commented 2 months ago

Sure thing

bnprks commented 1 month ago

Hi @douglasgscofield, sorry for the delay on this. I've just pushed some updates to this PR that I think should address your issue. The configure script will now read + use any flags given in the environment variables CFLAGS and LDFLAGS.

While I was at it, I also made a change to fix the printing when running with the environment variable BPCELLS_DEBUG_INSTALL set. The only relevant bits for this fix though are the ENV_CFLAGS and ENV_LDFLAGS variables.

Let me know if that seems to fix things for you and if so I'll merge this into main.

bnprks commented 1 month ago

Hi @douglasgscofield, hopefully these changes are useful to you. I'm merging them in now to wrap this up, but feel free to file an issue or something if it doesn't provide the flexibility you need for your installation configuration