Closed adimako closed 8 years ago
What is the commit ID of the stack?
PROTEUS : /u/adimako/proteus PROTEUS_ARCH : garnet.gnu PROTEUS_PREFIX : /u/adimako/proteus/garnet.gnu PROTEUS_VERSION : f9f37d0de2834eb038171fb47d0a854de443f9c9 HASHDIST_VERSION : 71d335be9ee04e3cc9a9df92a9348a2d8e3ed607 HASHSTACK_VERSION: ed55f4e10f07eb0b85fa6a0d15f4d0e5104902c0
I think you need to pull the stable/copper branches of both proteus and hashdist unless you've done a merge locally. The latest commits on those branches are Proteus https://github.com/erdc-cm/proteus/commit/4bd41f16fda9783ead3ece830ad4ea459be82990 https://github.com/hashdist/hashstack/commit/4e8a64e519678664a917b0bda4132cf18437f593
@cekees this solved things, thanks. So the packages are now installed. However, during compilation of proteus I get the following errors
cd stack && /u/adimako/proteus/hashdist/bin/hit develop -v -f -k error default.yaml /u/adimako/proteus/garnet.gnu
launcher:Unable to launch '/lustre/usr/local/u/adimako/proteus/garnet.gnu/bin/../../../../../lustre/usr/local/u/adimako/.hashdist/bld/python/3oefiwa4r63i/bin/python2.7' (No such file or directory)make: *** [/u/adimako/proteus/garnet.gnu/artifact.json] Error 127
Which I have seen before and I think it is caused by broken links in the $PROTEUS_ARCH. Listing the files in garnet.gnu/bin folder:
drwxr----- 2 adimako 0089JR40 4096 Feb 18 11:51 .
drwxr----- 6 adimako 0089JR40 4096 Feb 18 11:51 ..
lrwxrwxrwx 1 adimako 0089JR40 84 Feb 18 11:51 2to3 -> ../../../../../lustre/usr/local/u/adimako/.hashdist/bld/python/3oefiwa4r63i/bin/2to3
lrwxrwxrwx 1 adimako 0089JR40 84 Feb 18 11:51 idle -> ../../../../../lustre/usr/local/u/adimako/.hashdist/bld/python/3oefiwa4r63i/bin/idle
-rwxr-xr-x 1 adimako 0089JR40 10552 Feb 18 11:51 launcher
lrwxrwxrwx 1 adimako 0089JR40 85 Feb 18 11:51 pydoc -> ../../../../../lustre/usr/local/u/adimako/.hashdist/bld/python/3oefiwa4r63i/bin/pydoc
lrwxrwxrwx 1 adimako 0089JR40 7 Feb 18 11:51 python -> python2
lrwxrwxrwx 1 adimako 0089JR40 9 Feb 18 11:51 python2 -> python2.7
lrwxrwxrwx 1 adimako 0089JR40 8 Feb 18 11:51 python2.7 -> launcher
lrwxrwxrwx 1 adimako 0089JR40 96 Feb 18 11:51 python2.7-config -> ../../../../../lustre/usr/local/u/adimako/.hashdist/bld/python/3oefiwa4r63i/bin/python2.7-config
-rw-r----- 1 adimako 0089JR40 89 Feb 18 11:51 python2.7.link
lrwxrwxrwx 1 adimako 0089JR40 16 Feb 18 11:51 python2-config -> python2.7-config
lrwxrwxrwx 1 adimako 0089JR40 14 Feb 18 11:51 python-config -> python2-config
lrwxrwxrwx 1 adimako 0089JR40 88 Feb 18 11:51 smtpd.py -> ../../../../../lustre/usr/local/u/adimako/.hashdist/bld/python/3oefiwa4r63i/bin/smtpd.py
Most of these links are broken as the paths do not exist. I could fix them, but I am not sure how launcher works. I will look at the README file in hashdist
I think it is because in the / folder there is a link /u to the folder /usr/local/u and this causes the confusion. I have seen similar issues in hydra as well. I will now hardcode the links manually, but we should find a more consistent solution to this
If I remove the garnet.gnu folder, I get an error from hit command saying
[ERROR] [Errno 17] File exists in silent_absolute_symlink('/lustre/usr/local/u/adimako/.hashdist/bld/python/3oefiwa4r63i/bin/2to3', u'/u/adimako/proteus/garnet.gnu/bin/2to3')
[profile|ERROR] hit command failed: [Errno 17] [Errno 17] File exists in silent_absolute_symlink('/lustre/usr/local/u/adimako/.hashdist/bld/python/3oefiwa4r63i/bin/2to3', u'/u/adimako/proteus/garnet.gnu/bin/2to3')
Meaning you did make distclean
and then make develop
and you now get that error? What is the output of echo $HOME
?
Yep, setting $HOME to the actual path, not the symbolik link works.
I think hashdist picks up that something is wrong:
File exists in silent_absolute_symlink('/lustre/usr/local/u/adimako/.hashdist/bld/python/3oefiwa4r63i/bin/2to3', u'/u/adimako/proteus/garnet.gnu/bin/2to3')
but then decides to go with a path anyway.
Is there a way to set this manually, e.g. by using an env variable that bypasses the automated procedure?
or maybe it does not pick it up. But it does show two different paths for the same home folder and this may cause the issue
Just to make sure I'm clear: You did something like export HOME=/lustre/usr/local/u/adimako
and the build completed successfully? @zhang-alvin was having the same error yesterday on another cluster, but I'm not sure how he resolved it. On his machine we found that inside hashdist we were getting a contradition in this bit of code
try:
os.symlink(os.path.abspath(src), dst)
except OSError:
if not os.path.exists(dst):
raise
OSError was being raised with a code equivalent to "File Exists" but os.path.exists(dst) returned false.
Yep, i included the line of code you quoted in .bashrc and it proceeded with the compilation. Now I am trying to find out where the acml library is. (I have loaded the module but cannot find the library in LD_LIBRARY_PATH
Try module help acml
or env | grep ACML
after you load the module. I believe my LD_LIBRARY_PATH is set to the correct path in ~/.cekees/.cshrc, which you should be able to read.
Now i get an illegal instruction error in partition test. I think it has to do with the first line #!/usr/bin/env python
. This runs normally in cmd and starts python in console mode
(acml lib probem solved)
Another thing to note is that I log in by default in bash shell, not is csh/tcsh
Are you running an interactive job on the back end? Basically on these HPC machines no proteus tests will run on the login nodes because the mpi subsystem is disabled. See the qi
alias in my .cshrc for how to run an interactive job.
@cekees I see. Maybe it is worth trying with submitting the cases directly to the cluster and see how it goes.
I think these issues have been resolved, right?
@cekees Yes you can close it for now.
@cekees @tridelat as discussed we get the following error while building scipy in copper. At some point it says to recompile using -fPIC option, which we have done but we still get the same error
'[scipy] scipy/integrate/quadpack.h:804: warning: call to function 'dqawce_' without a real prototype [scipy] scipy/integrate/_quadpack.h:60: note: 'dqawce' was declared here [scipy] scipy/integrate/quadpack.h: In function 'quadpack_qawse': [scipy] scipy/integrate/quadpack.h:884: warning: call to function 'dqawse_' without a real prototype [scipy] scipy/integrate/_quadpack.h:59: note: 'dqawse' was declared here [scipy] scipy/integrate/quadpack.h:891: warning: call to function 'dqawse_' without a real prototype [scipy] scipy/integrate/_quadpack.h:59: note: 'dqawse' was declared here [scipy] /usr/bin/gfortran -Wall -fPIC -shared -Wl,-rpath=/lustre/usr/local/u/adimako/.hashdist/bld/python/vgl7ugq3lahf/lib -Wl,-rpath=/opt/acml/5.3.1/gfortran64/lib -L/lustre/usr/local/u/adimako/.hashdist/bld/python/vgl7ugq3lahf/lib/python2.7/config -lpython2.7 -lpthread -ldl -lutil -lm -Xlinker -export-dynamic build/temp.linux-x86_64-2.7/scipy/integrate/_quadpackmodule.o -Lbuild/temp.linux-x86_64-2.7 -lquadpack -llinpack_lite -lmach -lgfortran -o build/lib.linux-x86_64-2.7/scipy/integrate/_quadpack.so [scipy] /usr/lib64/gcc/x86_64-suse-linux/4.3/../../../../x86_64-suse-linux/bin/ld: /usr/lib64/gcc/x86_64-suse-linux/4.3/libgfortran.a(stop.o): relocation R_X86_64_32 against
.rodata.str1.1' can not be used when making a shared object; recompile with -fPIC [scipy] /usr/lib64/gcc/x86_64-suse-linux/4.3/libgfortran.a: could not read symbols: Bad value [scipy] collect2: ld returned 1 exit status [scipy] /usr/lib64/gcc/x86_64-suse-linux/4.3/../../../../x86_64-suse-linux/bin/ld: /usr/lib64/gcc/x86_64-suse-linux/4.3/libgfortran.a(stop.o): relocation R_X86_64_32 against
.rodata.str1.1' can not be used when making a shared object; recompile with -fPIC [scipy] /usr/lib64/gcc/x86_64-suse-linux/4.3/libgfortran.a: could not read symbols: Bad value [scipy] collect2: ld returned 1 exit status [scipy] error: Command "/usr/bin/gfortran -Wall -fPIC -shared -Wl,-rpath=/lustre/usr/local/u/adimako/.hashdist/bld/python/vgl7ugq3lahf/lib -Wl,-rpath=/opt/acml/5.3.1/gfortran64/lib -L/lustre/usr/local/u/adimako/.hashdist/bld/python/vgl7ugq3lahf/lib/python2.7/config -lpython2.7 -lpthread -ldl -lutil -lm -Xlinker -export-dynamic build/temp.linux-x86_64-2.7/scipy/integrate/_quadpackmodule.o -Lbuild/temp.linux-x86_64-2.7 -lquadpack -llinpack_lite -lmach -lgfortran -o build/lib.linux-x86_64-2.7/scipy/integrate/_quadpack.so" failed with exit status 1 [scipy|ERROR] Command '[u'/bin/bash', '_hashdist/build.sh']' returned non-zero exit status 1 [scipy|ERROR] command failed (code=1); raising make: *\ [/u/adimako/proteus/garnet.gnu/artifact.json] Error 127'