Open boegel opened 11 years ago
...and this might just solve a bug that we are noticing in a particular test case with the iomkl stack:
we think that openmpi's orted cannot find its pieces (under OAR and similar schedulers), unless it is assisted by the system in relation to LD_LIBRARY_PATH. You may never notice this if you are with PBS because it inherits the environment form the submitting process, using its own mechanism (SGE provides for -V option for the same reason).
The solution to this could be perhaps rpath or a static compile (great as a firework for an upcoming hackathon).
I think gradually moving to static or RPATH
builds regardless of whether there are any observed problems is good in an HPC setting...
Having the option to build the world with RPATH
is good; making it compulsory though, I think not really:
http://www.open-mpi.org/faq/?category=all#why-no-rpath
(in fact, it defeats the great feature of recent OpenMPI stacks of being ABI-compatible, which allows important freedom for some experimentation - module swap
is your friend)
addendum: ditto for static
, ref. http://www.open-mpi.org/faq/?category=all#static-mpi-apps , 101 & 102
@pforai: do you have more details on the ld
wrapper that CSC.fi uses for making RPATH linking easy?
I had another look at this for hpcugent/easybuild-easyconfigs#2228
We should just use a wrapper for ld
with contents:
#!/bin/bash
ld @$EBLINKEROPTIONS $@
Where $EBLINKEROPTIONS
must point to a file with all the linker options. This file must exists but can be empty.
@wpoely86 @boegel Torben Rasmussen's from Sweden have such a script, we can ask them how it looks like.
@pforai: up for firing them a mail on it? Not sure if he's on GitHub.
@boegel just dispatched, you in CC.
Is this where we are working on this?
In any case. GCC calls collect2 rather than ld and seems to be pretty good at finding the real ld.
Not sure what I should override. Maybe I can set an environment variable to force it to use my special ld?
access("/software/easybuild/software/GCC/4.9.2/libexec/gcc/x86_64-unknown-linux-gnu/4.9.2/collect2", X_OK) = 0
[pid 4142] execve("/software/easybuild/software/GCC/4.9.2/libexec/gcc/x86_64-unknown-linux-gnu/4.9.2/collect2", ["/software/easybuild/software/GCC"..., "-plugin", "/software/easybuild/software/GCC"..., "-plugin-opt=/software/easybuild/"..., "-plugin-opt=-fresolution=/tmp/cc"..., "-plugin-opt=-pass-through=-lgcc", "-plugin-opt=-pass-through=-lgcc_"..., "-plugin-opt=-pass-through=-lpthr"..., "-plugin-opt=-pass-through=-lc", "-plugin-opt=-pass-through=-lgcc", "-plugin-opt=-pass-through=-lgcc_"..., "--eh-frame-hdr", "-m", "elf_x86_64", "-dynamic-linker", "/lib64/ld-linux-x86-64.so.2", ...], [/* 104 vars */] <unfinished ...>
[pid 4142] <... execve resumed> ) = 0
[pid 4142] stat("/software/easybuild/software/GCC/4.9.2/libexec/gcc/x86_64-unknown-linux-gnu/4.9.2/real-ld", 0x7ffcc99185b0) = -1 ENOENT (No such file or directory)
[pid 4142] stat("/software/easybuild/software/GCC/4.9.2/libexec/gcc/x86_64-unknown-linux-gnu/4.9.2/real-ld", 0x7ffcc99185b0) = -1 ENOENT (No such file or directory)
[pid 4142] stat("/software/easybuild/software/GCC/4.9.2/libexec/gcc/x86_64-unknown-linux-gnu/real-ld", 0x7ffcc99185b0) = -1 ENOENT (No such file or directory)
[pid 4142] stat("/software/easybuild/software/GCC/4.9.2/libexec/gcc/x86_64-unknown-linux-gnu/4.9.2/collect-ld", 0x7ffcc99185b0) = -1 ENOENT (No such file or directory)
[pid 4142] stat("/software/easybuild/software/GCC/4.9.2/libexec/gcc/x86_64-unknown-linux-gnu/4.9.2/collect-ld", 0x7ffcc99185b0) = -1 ENOENT (No such file or directory)
[pid 4142] stat("/software/easybuild/software/GCC/4.9.2/libexec/gcc/x86_64-unknown-linux-gnu/collect-ld", 0x7ffcc99185b0) = -1 ENOENT (No such file or directory)
[pid 4142] stat("/software/easybuild/software/GCC/4.9.2/libexec/gcc/x86_64-unknown-linux-gnu/4.9.2/ld.gold", 0x7ffcc99185b0) = -1 ENOENT (No such file or directory)
[pid 4142] stat("/software/easybuild/software/GCC/4.9.2/libexec/gcc/x86_64-unknown-linux-gnu/4.9.2/ld.gold", 0x7ffcc99185b0) = -1 ENOENT (No such file or directory)
[pid 4142] stat("/software/easybuild/software/GCC/4.9.2/libexec/gcc/x86_64-unknown-linux-gnu/ld.gold", 0x7ffcc99185b0) = -1 ENOENT (No such file or directory)
[pid 4142] stat("/software/easybuild/software/GCC/4.9.2/libexec/gcc/x86_64-unknown-linux-gnu/4.9.2/gnm", 0x7ffcc99185b0) = -1 ENOENT (No such file or directory)
[pid 4142] stat("/software/easybuild/software/GCC/4.9.2/libexec/gcc/x86_64-unknown-linux-gnu/4.9.2/gnm", 0x7ffcc99185b0) = -1 ENOENT (No such file or directory)
[pid 4142] stat("/software/easybuild/software/GCC/4.9.2/libexec/gcc/x86_64-unknown-linux-gnu/gnm", 0x7ffcc99185b0) = -1 ENOENT (No such file or directory)
[pid 4142] stat("/software/easybuild/software/GCC/4.9.2/libexec/gcc/x86_64-unknown-linux-gnu/4.9.2/nm", 0x7ffcc99185b0) = -1 ENOENT (No such file or directory)
[pid 4142] stat("/software/easybuild/software/GCC/4.9.2/libexec/gcc/x86_64-unknown-linux-gnu/4.9.2/nm", 0x7ffcc99185b0) = -1 ENOENT (No such file or directory)
[pid 4142] stat("/software/easybuild/software/GCC/4.9.2/libexec/gcc/x86_64-unknown-linux-gnu/nm", 0x7ffcc99185b0) = -1 ENOENT (No such file or directory)
[pid 4142] stat("/software/easybuild/software/GCC/4.9.2/libexec/gcc/x86_64-unknown-linux-gnu/4.9.2/gstrip", 0x7ffcc99185b0) = -1 ENOENT (No such file or directory)
[pid 4142] stat("/software/easybuild/software/GCC/4.9.2/libexec/gcc/x86_64-unknown-linux-gnu/4.9.2/gstrip", 0x7ffcc99185b0) = -1 ENOENT (No such file or directory)
[pid 4142] stat("/software/easybuild/software/GCC/4.9.2/libexec/gcc/x86_64-unknown-linux-gnu/gstrip", 0x7ffcc99185b0) = -1 ENOENT (No such file or directory)
[pid 4142] stat("/software/easybuild/software/GCC/4.9.2/libexec/gcc/x86_64-unknown-linux-gnu/4.9.2/strip", 0x7ffcc99185b0) = -1 ENOENT (No such file or directory)
[pid 4142] stat("/software/easybuild/software/GCC/4.9.2/libexec/gcc/x86_64-unknown-linux-gnu/4.9.2/strip", 0x7ffcc99185b0) = -1 ENOENT (No such file or directory)
[pid 4142] stat("/software/easybuild/software/GCC/4.9.2/libexec/gcc/x86_64-unknown-linux-gnu/strip", 0x7ffcc99185b0) = -1 ENOENT (No such file or directory)
[pid 4142] stat("/software/easybuild/software/GCC/4.9.2/libexec/gcc/x86_64-unknown-linux-gnu/4.9.2/gcc", 0x7ffcc99185b0) = -1 ENOENT (No such file or directory)
[pid 4142] stat("/software/easybuild/software/GCC/4.9.2/libexec/gcc/x86_64-unknown-linux-gnu/4.9.2/gcc", 0x7ffcc99185b0) = -1 ENOENT (No such file or directory)
[pid 4142] stat("/software/easybuild/software/GCC/4.9.2/libexec/gcc/x86_64-unknown-linux-gnu/gcc", 0x7ffcc99185b0) = -1 ENOENT (No such file or directory)
[pid 4143] execve("/bin/ld.gold", ["/bin/ld.gold", "-plugin", "/software/easybuild/software/GCC"..., "-plugin-opt=/software/easybuild/"..., "-plugin-opt=-fresolution=/tmp/cc"..., "-plugin-opt=-pass-through=-lgcc", "-plugin-opt=-pass-through=-lgcc_"..., "-plugin-opt=-pass-through=-lpthr"..., "-plugin-opt=-pass-through=-lc", "-plugin-opt=-pass-through=-lgcc", "-plugin-opt=-pass-through=-lgcc_"..., "--eh-frame-hdr", "-m", "elf_x86_64", "-dynamic-linker", "/lib64/ld-linux-x86-64.so.2", ...], [/* 104 vars */] <unfinished ...>
[pid 4143] <... execve resumed> ) = 0
[pid 4143] open("/software/easybuild/software/GCC/4.9.2/libexec/gcc/x86_64-unknown-linux-gnu/4.9.2/liblto_plugin.so", O_RDONLY|O_CLOEXEC) = 3
[easybuild@de0bb4c6145e samtools-1.2]$ echo $PATH
/tmp/eb-A0ozfo/eb-ldwrapper-sTMN3t:/export/easybuild/software/SAMtools/1.2-foss-2015a/bin:/export/easybuild/software/ncurses/5.9-foss-2015a/bin:/tmp/eb-A0ozfo/eb-ldwrapper-sTMN3t:/software/easybuild/software/FFTW/3.3.4-gompi-2015a/bin:/software/easybuild/software/OpenBLAS/0.2.13-GCC-4.9.2-LAPACK-3.5.0/bin:/software/easybuild/software/OpenMPI/1.8.4-GCC-4.9.2/bin:/software/easybuild/software/hwloc/1.10.0-GCC-4.9.2/bin:/software/easybuild/software/numactl/2.0.10-GCC-4.9.2/bin:/software/easybuild/software/GCC/4.9.2/bin:/software/easybuild/software/FPM/1.3.3-Ruby-2.1.6/bin:/software/easybuild/software/Ruby/2.1.6/bin:/software/easybuild-develop/easybuild-framework:/usr/local/bin:/bin:/usr/bin:/usr/local/sbin:/usr/sbin:/home/easybuild/.local/bin:/home/easybuild/bin
[easybuild@de0bb4c6145e samtools-1.2]$ which ld
/tmp/eb-A0ozfo/eb-ldwrapper-sTMN3t/ld
seems like it is ok if I get ld.gold in the path.
[easybuild@de0bb4c6145e samtools-1.2]$ make
gcc -pthread -o samtools bam_index.o bam_plcmd.o sam_view.o bam_cat.o bam_md.o bam_reheader.o bam_sort.o bedidx.o kprobaln.o bam_rmdup.o bam_rmdupse.o bam_mate.o bam_stat.o bam_color.o bamtk.o bam2bcf.o bam2bcf_indel.o errmod.o sample.o cut_target.o phase.o bam2depth.o padding.o bedcov.o bamshuf.o faidx.o stats.o stats_isize.o bam_flags.o bam_split.o bam_tview.o bam_tview_curses.o bam_tview_html.o bam_lpileup.o libbam.a htslib-1.2.1/libhts.a -L/export/easybuild/software/ncurses/5.9-foss-2015a/lib -lcurses -lm -L/export/easybuild/software/zlib/1.2.8-foss-2015a/lib -lz
INFO: linking with rpath
INFO: linking with rpath and NSC symbols
INFO: RPATH : -rpath=/export/easybuild/software/SAMtools/1.2-foss-2015a/lib -rpath=/export/easybuild/software/ncurses/5.9-foss-2015a/lib -rpath=/export/easybuild/software/zlib/1.2.8-foss-2015a/lib -rpath=/software/easybuild/software/FFTW/3.3.4-gompi-2015a/lib -rpath=/software/easybuild/software/GCC/4.9.2/lib -rpath=/software/easybuild/software/GCC/4.9.2/lib/gcc/x86_64-unknown-linux-gnu/4.9.2 -rpath=/software/easybuild/software/GCC/4.9.2/lib64 -rpath=/software/easybuild/software/OpenBLAS/0.2.13-GCC-4.9.2-LAPACK-3.5.0/lib -rpath=/software/easybuild/software/OpenMPI/1.8.4-GCC-4.9.2/lib -rpath=/software/easybuild/software/Ruby/2.1.6/lib -rpath=/software/easybuild/software/ScaLAPACK/2.0.2-gompi-2015a-OpenBLAS-0.2.13-LAPACK-3.5.0/lib -rpath=/software/easybuild/software/hwloc/1.10.0-GCC-4.9.2/lib -rpath=/software/easybuild/software/numactl/2.0.10-GCC-4.9.2/lib sym_str:
/bin/ld.gold: error: cannot open : No such file or directory
collect2: error: ld returned 1 exit status
make: *** [samtools] Error 1
[pid 14357] execve("/tmp/eb-A0ozfo/eb-ldwrapper-sTMN3t/ld.gold", ["/tmp/eb-A0ozfo/eb-ldwrapper-sTMN"..., "-plugin", "/software/easybuild/software/GCC"..., "-plugin-opt=/software/easybuild/"..., "-plugin-opt=-fresolution=/tmp/cc"..., "-plugin-opt=-pass-through=-lgcc", "-plugin-opt=-pass-through=-lgcc_"..., "-plugin-opt=-pass-through=-lpthr"..., "-plugin-opt=-pass-through=-lc", "-plugin-opt=-pass-through=-lgcc", "-plugin-opt=-pass-through=-lgcc_"..., "--eh-frame-hdr", "-m", "elf_x86_64", "-dynamic-linker", "/lib64/ld-linux-x86-64.so.2", ...], [/* 104 vars */] <unfinished ...>
[pid 14357] <... execve resumed> ) = 0
[pid 14360] execve("/usr/bin/tr", ["/usr/bin/tr", "[:upper:]", "[:lower:]"], [/* 103 vars */]) = 0
[pid 14741] execve("/bin/sort", ["sort", "-u"], [/* 103 vars */]) = 0
[pid 14752] execve("/bin/grep", ["grep", "NSC_"], [/* 103 vars */]) = 0
[pid 14751] execve("/bin/env", ["env"], [/* 103 vars */]) = 0
[pid 14753] execve("/bin/ld.gold", ["/bin/ld.gold", "-rpath=/export/easybuild/softwar"..., "-plugin", "/software/easybuild/software/GCC"..., "-plugin-opt=/software/easybuild/"..., "-plugin-opt=-fresolution=/tmp/cc"..., "-plugin-opt=-pass-through=-lgcc", "-plugin-opt=-pass-through=-lgcc_"..., "-plugin-opt=-pass-through=-lpthr"..., "-plugin-opt=-pass-through=-lc", "-plugin-opt=-pass-through=-lgcc", "-plugin-opt=-pass-through=-lgcc_"..., "--eh-frame-hdr", "-m", "elf_x86_64", "-dynamic-linker", ...], [/* 103 vars */]) = 0
[pid 14753] open("/software/easybuild/software/GCC/4.9.2/libexec/gcc/x86_64-unknown-linux-gnu/4.9.2/liblto_plugin.so", O_RDONLY|O_CLOEXEC) = 3
[pid 14348] access("/software/easybuild/software/GCC/4.9.2/libexec/gcc/x86_64-unknown-linux-gnu/4.9.2/liblto_plugin.rpo", R_OK) = -1 ENOENT (No such file or directory)
I need to tune the 100 cut forks in the script as well....
Do you want me to push my WIP?
So this is sort of working now (though I understand we are going in a different direction).
One odd thing is that using the script I get output like this
0x000000000000000f (RPATH) Library rpath: [/export/easybuild/software/ncurses/5.9-foss-2015a/lib -rpath=/export/easybuild/software/zlib/1.2.8-foss-2015a/lib -rpath=/software/easybui
ld/software/FFTW/3.3.4-gompi-2015a/lib -rpath=/software/easybuild/software/GCC/4.9.2/lib -rpath=/software/easybuild/software/GCC/4.9.2/lib/gcc/x86_64-unknown-linux-gnu/4.9.2 -rpath=/software/easy
build/software/GCC/4.9.2/lib64 -rpath=/software/easybuild/software/OpenBLAS/0.2.13-GCC-4.9.2-LAPACK-3.5.0/lib -rpath=/software/easybuild/software/OpenMPI/1.8.4-GCC-4.9.2/lib -rpath=/software/easy
build/software/Ruby/2.1.6/lib -rpath=/software/easybuild/software/ScaLAPACK/2.0.2-gompi-2015a-OpenBLAS-0.2.13-LAPACK-3.5.0/lib -rpath=/software/easybuild/software/binutils/2.25-GCC-4.9.3-binutils
-2.25/lib -rpath=/software/easybuild/software/hwloc/1.10.0-GCC-4.9.2/lib -rpath=/software/easybuild/software/numactl/2.0.10-GCC-4.9.2/lib]
So it is detecting the first -rpath= and not seeing the spaces, thus concatenating that all together. That seems wrong...
@ocaisa have you seen anything like that?
Sorry, I've done a lot of reading but nothing in practice. I can have a little fun with the environment variable approach next week, that'll be easy to take for a test drive.
On 5 Feb 2016 01:37, "Robert Schmidt" notifications@github.com<mailto:notifications@github.com> wrote:
So this is sort of working now (though I understand we are going in a different direction).
One odd thing is that using the script I get output like this
0x000000000000000f (RPATH) Library rpath: [/export/easybuild/software/ncurses/5.9-foss-2015a/lib -rpath=/export/easybuild/software/zlib/1.2.8-foss-2015a/lib -rpath=/software/easybui ld/software/FFTW/3.3.4-gompi-2015a/lib -rpath=/software/easybuild/software/GCC/4.9.2/lib -rpath=/software/easybuild/software/GCC/4.9.2/lib/gcc/x86_64-unknown-linux-gnu/4.9.2 -rpath=/software/easy build/software/GCC/4.9.2/lib64 -rpath=/software/easybuild/software/OpenBLAS/0.2.13-GCC-4.9.2-LAPACK-3.5.0/lib -rpath=/software/easybuild/software/OpenMPI/1.8.4-GCC-4.9.2/lib -rpath=/software/easy build/software/Ruby/2.1.6/lib -rpath=/software/easybuild/software/ScaLAPACK/2.0.2-gompi-2015a-OpenBLAS-0.2.13-LAPACK-3.5.0/lib -rpath=/software/easybuild/software/binutils/2.25-GCC-4.9.3-binutils -2.25/lib -rpath=/software/easybuild/software/hwloc/1.10.0-GCC-4.9.2/lib -rpath=/software/easybuild/software/numactl/2.0.10-GCC-4.9.2/lib]
So it is detecting the first -rpath= and not seeing the spaces, thus concatenating that all together. That seems wrong...
@ocaisahttps://github.com/ocaisa have you seen anything like that?
— Reply to this email directly or view it on GitHubhttps://github.com/hpcugent/easybuild-framework/issues/651#issuecomment-180125585.
Forschungszentrum Juelich GmbH 52425 Juelich Sitz der Gesellschaft: Juelich Eingetragen im Handelsregister des Amtsgerichts Dueren Nr. HR B 3498 Vorsitzender des Aufsichtsrats: MinDir Dr. Karl Eugen Huthmacher Geschaeftsfuehrung: Prof. Dr.-Ing. Wolfgang Marquardt (Vorsitzender), Karsten Beneke (stellv. Vorsitzender), Prof. Dr.-Ing. Harald Bolt,
WIP PR by @rjeschmi: #1613
Using
RPATH
as an alternative to modules for handling dependencies should be supported by the EasyBuild framework.