Closed angus-g closed 1 year ago
Hi @angus-g. It sounds like this is an mkl issue more than it's a Firedrake one, though having the solution documented here may well be useful to there users.
@dacreman has established git repositories in the Firedrake organisation that have build and run scripts for ARCHER and Isambard, two of the UK national supercomputers: https://github.com/firedrakeproject/isambard https://github.com/firedrakeproject/firedrake-archer
We would be very happy to host a similar repo for gadi if you think that would be helpful.
Thanks @dham, it's certainly a system-specific issue. However, there's a lot on the Firedrake side that I'm not completely across yet, such as the dependency installation, code generation and loading, etc. I thought about trying to test PETSc itself, to see if I can replicate the error.
I think once we get a working build sorted out, having a similar repo for gadi would be really helpful!
I've just encountered this issue in a fresh install on niagara, Compute Canada's resource for "large parallel jobs", and the command-line fix above does seem to work there as well.
@angus-g : did you find a more elegant fix for this?
Unfortunately not, but I've wrapped Firedrake in an environment module so this can be handled transparently (this shows just the way to activate/deactivate Firedrake, and then environment variables can be exported as usual):
source /opt/Modules/extensions/extensions.tcl
set-basedir -root /g/data/xd2/modules
if { [module-info mode load] || [module-info mode switch2] } {
puts stdout "source $::basedir/bin/activate;"
} elseif { [module-info mode remove] && ![module-info mode switch3] } {
puts stdout "deactivate;"
}
Just to complete the documentation of what works for me, I now execute
export LD_LIBRARY_PATH=$MKLROOT/lib/intel64/:$LD_LIBRARY_PATH
in my job scripts, after loading the MKL module and activating firedrake. I think the real issue here is just that - the MKL module just isn't setting the path correctly...
I'm closing this issue as it is quite old and seems to have a working solution. Feel free to reopen if you disagree
I've managed to build Firedrake on our new HPC platform in Australia, gadi. I had
intel-mkl/2019.3.199
loaded as an environment module during the install, which went cleanly. However, running the test suite ran into a lot of errors. For example, running thema-demo
(just because it's the first failing test frompytest -x
), I got:Following https://stackoverflow.com/a/37160780/11838997, it seems like MKL isn't being linked in properly somewhere. Indeed, setting
LD_PRELOAD=$MKLROOT/lib/intel64/libmkl_core.so:$MKLROOT/lib/intel64/libmkl_sequential.so
allows me to run the demo as expectedAs far as I can tell, the error cropped up from the
u_solv.solve()
line. Checking all the shared libraries in the cache directory withldd
, both expected MKL libs are present.