Open ocaisa opened 3 years ago
To get it get it to use rdma inside the prefix layer I needed to explicitly provide the path:
configopts = '--enable-optimizations --enable-cma --enable-mt --with-verbs --with-rdmacm=/cvmfs/pilot.eessi-hpc.org/2020.12/compat/linux/x86_64/usr --with-sysroot=/cvmfs/pilot.eessi-hpc.org/2020.12/compat/linux/x86_64'
Just setting --with-sysroot
is not enough. This is probably why on some archs you might get the support and on others not, it depends on what is on the host.
This is not really an issue with the prefix layer, I'm going to move it
I also checked libfabric
and I see in the configure there
checking for sysroot... no
which I would also have suspicions about.
I checked our 2020.10
installation, and it has the same issue / configuration output. The output in the comment at https://github.com/EESSI/compatibility-layer/issues/49#issuecomment-706192572 is from a manual UCX installation where I indeed explicitly passed --with-rdmacm
to the configure, so we should somehow pass this to our UCX installation as well (using a hook?).
I see that the configure
of both libfabric
and UCX
allow a --with-sysroot
flag:
--with-sysroot=DIR Search for dependent libraries within DIR
(or the compiler's sysroot if not specified).
As the compiler has been configured with --with-sysroot
set to the prefix, I assume we don't necessarily have to use this flag for these packages.
This has been fixed in 2021.03
:
configure: UCT modules: < ib rdmacm cma >
We should still have some (ReFrame?) test for this to make sure that UCX is always correctly configured in future versions, though, so let's leave this issue open to not forget about this.
I was looking at the UCX configuration in
2020.12
and I noticed that it looks like we have a regression. From https://github.com/EESSI/compatibility-layer/issues/49#issuecomment-706192572 it looks like we should have a UCX configuration likebut in the build log for UCX (on Zen2) I see
(note the missing
rdmacm
)We should probably explicitly insert what we expect from the final build (
--with-rdmacm
) so that configure will fail rather than build regardless. UCX in particular is critical to the stack so could do with additonal checks.