Closed desilva2185 closed 7 years ago
I think I saw this ones and got it fixed somehow. I guess, the root of the problem is a multilevel binary attachment, which PoD injects in the body of the job script.
I will look for a possibility to silence PBS.
In a minute I will build a version with this fix for you to test.
Thanks !
regards, Asoka
On Jul 3, 2014, at 9:33 AM, Anar Manafov notifications@github.com wrote:
In a minute I will build a version with this fix for you test.
— Reply to this email directly or view it on GitHub.
Adr: ATLAS Tier-1, TRIUMF, 4004 Wesbrook Mall, Vancouver B.C. V6T 2A3, Canada Url: http://trshare.triumf.ca/~desilva/Personal Tel: (604) 222-7496
Can you please try this version to make sure it works on your Torque env? http://pod.gsi.de/releases/pod/nightly/PoD-3.16.1.g43df-Source.tar.gz
Please let me know whether it fixes the issue. Feel free to re-open if issue persists.
Ok, will try and let you know but it may be next week (just came back and am swamped.)
regards, Asoka
On Jul 3, 2014, at 9:50 AM, Anar Manafov notifications@github.com wrote:
Can you please try this version to make sure it works on your Torque env? http://pod.gsi.de/releases/pod/nightly/PoD-3.16.1.g43df-Source.tar.gz
Please let me know whether it fixes the issue. Feel free to re-open if issue persists.
— Reply to this email directly or view it on GitHub.
Adr: ATLAS Tier-1, TRIUMF, 4004 Wesbrook Mall, Vancouver B.C. V6T 2A3, Canada Url: http://trshare.triumf.ca/~desilva/Personal Tel: (604) 222-7496
Sounds good, Asoka! Meanwhile I will investigate the other issue you reported.
Hi Anar,
Finally got the chance to test this. Apparently this depends on uuencode being available and this is not installed by default on vanilla SL6 OS.
Some sites have it (lxplus, TRIUMF T1 and T3) while others do not (Australia testbed and my testbed).
Is there an alternative or is this a requirement ?
Thanks !
regards, Asoka
Hi Anar, I have hit this problem with my latest installation of Torque/Maui... is the solution you propose still valid or is there something better? I am using PoD 3.16.
Isidro
Hi Isidro,
I haven't had a chance to investigate other solutions. Please try this one. If won't help, we will figure something else out.
Hi Anar, It seems to work provided I install uuencode in all my nodes (through the sharutils package in SLC6). I am now facing some other problem when submitting with the following error message:
qsub: submit error (Invalid request MSG=cannot locate new job 17[].xxxxx.uniovi.es (0 - Success))
No idea why. Cheers, Isidro
Hi Anar, Actually I traced the problem to some issue with PoD. It seems the pod-worker file is no where in my installation (or in my area) and it fails when the submitted script tries to copy it. Any hint?
Update If I understand correctly what I just reported is not a problem, but a way to check wether we are using shared folders. Going further down in the log it seems there is some issue with tar:
Writing files in node's directory /tmp/iglez/PoD_2Cq4hsLUrd
cp: cannot stat `/mnt_pool/fanae105/user/iglez/.PoD/wrk/pod-worker': No such file or directory
uudecode: stdin: Short file
gzip: stdin: unexpected end of file
PoDWorker.sh
tar: Unexpected EOF in archive
tar: Unexpected EOF in archive
tar: Error is not recoverable: exiting now
...
...
*** [Fri, 24 Mar 2017 17:39:19 +0100] +++ PoD Worker START +++
*** [Fri, 24 Mar 2017 17:39:19 +0100] Current working directory: /tmp/iglez/PoD_2Cq4hsLUrd
*** [Fri, 24 Mar 2017 17:39:19 +0100] Untar payload...
uudecode: stdin: Short file
gzip: stdin: unexpected end of file
xpd.cf
PoD.cfg
version
server_info.cfg
pod-wrk-bin-3.16.1.g43df-Darwin-universal.tar.gz
tar: Unexpected EOF in archive
tar: Unexpected EOF in archive
tar: Error is not recoverable: exiting now
*** [Fri, 24 Mar 2017 17:39:19 +0100] host's CPU/instruction set: amd64
*** [Fri, 24 Mar 2017 17:39:19 +0100] PoD worker runs on Linux-x86_64
*** [Fri, 24 Mar 2017 17:39:19 +0100] Error: Can't find WN pre-compiled bin.: /tmp/iglez/PoD_2Cq4hsLUrd/pod-wrk-bin-3.16.1.g43df-Linux-amd64.tar.gz
*** [Fri, 24 Mar 2017 17:39:19 +0100] Starting the cleaning procedure...
*** [Fri, 24 Mar 2017 17:39:19 +0100] Gracefully shut down PoD worker process(es):
*** [Fri, 24 Mar 2017 17:39:19 +0100] done cleaning up.
So, it may be that tar and uuencode/uudecode are not working properly?
Isidro
Is this folder "/mnt_pool/fanae105/user/iglez" actually shared between the submit host and WNs on your PBS?
If yes, can you please pack and send me your ~/.PoD dir?
Hi Anar, Yep. That is the home folder and it is shared. Actually the problem seems to be (if I traced it back properly) to the uudecode call which seems not to be able to deal with the payload. I added "set -x" here and there in the scripts to get the maximum output when they are run. I have just send you by mail the .PoD folder (tar.gz) and the output of one of the jobs failing.
Cheers,
Isidro
Isidro, man, can you please send me the email with the archive again? I lost it somehow. Can't find it anymore. :(
Hi Anar,
Just resend it a minute ago. Let me know if you recieve it. Thanks a lot for your help!
Isidro
From: Anar Manafov [notifications@github.com] Sent: 30 March 2017 14:05 To: AnarManafov/PoD Cc: Isidro Gonzalez Caballero; Comment Subject: Re: [AnarManafov/PoD] Torque-4.2.7 and pod-submit failing (#1)
Isidro, man, can you please send me the email with the archive again? I lost it somehow. Can't find it anymore. :(
— You are receiving this because you commented. Reply to this email directly, view it on GitHubhttps://github.com/AnarManafov/PoD/issues/1#issuecomment-290390811, or mute the threadhttps://github.com/notifications/unsubscribe-auth/AFl1bY506ogCvWnD-UfRuRQ872uIq3d5ks5rq5qUgaJpZM4CH2M3.
Isidro, just for my info. Did you try the current master or a tagged 3.16?
Hi Anar, I tried the version you suggested above from this link:
http://pod.gsi.de/releases/pod/nightly/PoD-3.16.1.g43df-Source.tar.gz
Adding set -x
in many scripts to get a very verbose output. And added a line at the end of the payload after encoding because in a couple of trials I did I could not decode the file otherwise. Without modifications it did not work either.
Cheers,
Isidro
Ok, thanks. I think I found the bug. Looking for a proper fix now...
Isidro, please try the current master. The issue should be fixed by d3f67dd6db11d03fe2db05835714f635c6cad60a.
Feel free to reopen if the issue persists.
Hi Anar, Now I am having problems compiling. Actually the problem seems to come from cmake. I have tried three different versions of cmake with similar results. The output is
loading initial cache file ../BuildSetup.cmake
CMake Warning (dev) in CMakeLists.txt:
Syntax Warning in cmake code at
/nfs/fanae/PoD_releases/PoD-master-Source/CMakeLists.txt:102:28
Argument not separated from preceding token by whitespace.
This warning is for project developers. Use -Wno-dev to suppress it.
CMake Warning (dev) in CMakeLists.txt:
Syntax Warning in cmake code at
/nfs/fanae/PoD_releases/PoD-master-Source/CMakeLists.txt:105:28
Argument not separated from preceding token by whitespace.
This warning is for project developers. Use -Wno-dev to suppress it.
-- The C compiler identification is GNU 5.3.0
-- The CXX compiler identification is GNU 5.3.0
-- Check for working C compiler: /cvmfs/cms.cern.ch/slc6_amd64_gcc530/external/gcc/5.3.0/bin/cc
-- Check for working C compiler: /cvmfs/cms.cern.ch/slc6_amd64_gcc530/external/gcc/5.3.0/bin/cc -- works
-- Detecting C compiler ABI info
-- Detecting C compiler ABI info - done
-- Check for working CXX compiler: /cvmfs/cms.cern.ch/slc6_amd64_gcc530/external/gcc/5.3.0/bin/c++
-- Check for working CXX compiler: /cvmfs/cms.cern.ch/slc6_amd64_gcc530/external/gcc/5.3.0/bin/c++ -- works
-- Detecting CXX compiler ABI info
-- Detecting CXX compiler ABI info - done
-- Found Doxygen: /usr/bin/doxygen (found version "1.6.1")
CMAKE VERSION 2.8.12.2
-- Boost version: 1.41.0
-- Boost version: 1.41.0
-- Found the following Boost libraries:
-- thread
-- program_options
-- filesystem
-- system
-- unit_test_framework
CMake Warning (dev) at CMakeLists.txt:260 (add_subdirectory):
The source directory
/nfs/fanae/PoD_releases/PoD-master-Source/app/MiscCommon
does not contain a CMakeLists.txt file.
CMake does not support this case but it used to work accidentally and is
being allowed for compatibility.
Policy CMP0014 is not set: Input directories must have CMakeLists.txt. Run
"cmake --help-policy CMP0014" for policy details. Use the cmake_policy
command to set the policy and suppress this warning.
This warning is for project developers. Use -Wno-dev to suppress it.
-- Build pod-agent - YES
-- Build pod-agent unit tests - YES
-- Build pod-info - YES
-- Build pod-remote - YES
-- Build pod-user-defaults - YES
-- Build pod-ssh - YES
-- Build pod-ssh unit tests - YES
-- Configuring done
CMake Error at CMakeLists.txt:108 (add_custom_target):
Error evaluating generator expression:
$<TARGET_FILE:pod_protocol>
No target "pod_protocol"
CMake Error at CMakeLists.txt:108 (add_custom_target):
Error evaluating generator expression:
$<TARGET_FILE:proof_status_file>
No target "proof_status_file"
CMake Error at CMakeLists.txt:108 (add_custom_target):
Error evaluating generator expression:
$<TARGET_FILE:SSHTunnel>
No target "SSHTunnel"
-- Generating done
-- Build files have been written to: /nfs/fanae/PoD_releases/PoD-master-Source/build
I have update came cfg to support modern versions of cmake.
Also, you forgot to execute "git submodule update --init". This is the reason for:
/nfs/fanae/PoD_releases/PoD-master-Source/app/MiscCommon
does not contain a CMakeLists.txt file.
Hi again Anar,
Thanks a lot for the quick answer! Now everything works clean and smooth (no warnings) in the building. BTW, I missed in the documentation the git submodule...
command.
Now, the next step is getting the binaries where it fails :(
$ pod-server getbins
WNs pre-compiled binaries are missing.
Downloading WNs pre-compiled binaries...
Error: Can't download pre-compiled binaries for WNs.
Please check availability of http://pod.gsi.de/releases/add/3.17.2.g2ebb/pod-wrk-bin-3.17.2.g2ebb-Linux-x86.tar.gz
I guess they are in http://pod.gsi.de/releases/add/3.17/, so probably is just a question of changing etc/version
. Right?
Cheers,
Isidro
since you build from source you also need to build WN bins. This is pretty heaven feature (need to document :) ).
make -j wn_bin
make -j install
Hi Anar again, There is still something missing. After executing those two commands (I even retried everything from the beginning) I get errors:
[iglez@fanae128 build]$ make -j wn_bin
[ 30%] [ 38%] [ 38%] Built target SSHTunnel
Built target proof_status_file
Built target pod-user-defaults
[ 38%] Built target pod_protocol
[ 92%] Built target pod-agent
[100%] Generate WN binary package
[100%] Built target wn_bin
[iglez@fanae128 build]$ pod-server getbins
WNs pre-compiled binaries are missing.
Downloading WNs pre-compiled binaries...
Error: Can't download pre-compiled binaries for WNs.
Please check availability of http://pod.gsi.de/releases/add/3.17.2.g2ebb/pod-wrk-bin-3.17.2.g2ebb-Linux-x86.tar.gz
[iglez@fanae128 build]$ pod-server start
WNs pre-compiled binaries are missing.
Downloading WNs pre-compiled binaries...
Error: Can't download pre-compiled binaries for WNs.
Please check availability of http://pod.gsi.de/releases/add/3.17.2.g2ebb/pod-wrk-bin-3.17.2.g2ebb-Linux-x86.tar.gz
I see the binary in the build directory:
[iglez@fanae128 build]$ ls
app CMakeFiles CPackConfig.cmake install_manifest.txt pod-wrk-bin
CMakeCache.txt cmake_install.cmake CPackSourceConfig.cmake Makefile pod-wrk-bin-3.17.2.g2ebb-Linux-amd64.tar.gz
But I cannot find it in the installation path. A configuration missing?
Cheers,
Isidro
No problem, man!
the order is important. You have to build wn_bin and then install it. If you just build wn_bin target it won't get installed. You need to trigger the install target as well. So, try it as I posted above:
make -j wn_bin
make -j install
Hi Anar, Thanks again for the fast answer. I was not clear in my previous post. But I had done that and it doesn't fix the problem. I posted the result so you see in the output it was executed (nothing to redo).
I even tried it from scratch: cmake + make + make wn_bin + make install with no success.
Isidro
Indeed, my bad. I failed to check the rest of the message. Sorry for that.
The getbins argument is for only central installations, of released PoD versions. Since you built a version on your own, you should have wn_bin for your system already. So, just go ahead and try to start the server.
Please let me know the result. I don't have environment at the moment, can't check it myself. It has been long time since I touched PoD :(
Hi Anar,
I am aiming at a central installation (/nfs/fanae/PoD_releases/PoD-master.slc6
) from the source code (/nfs/fanae/PoD_releases/PoD-master-Source
) since this is used by a few people on our farm combined with our framework (PAF). I followed your instructions on a clean new area and got again the same issues. Full log below:
[iglez@fanae128 PoD_releases]$ grep INSTALL PoD-master-Source/BuildSetup.cmake
SET (CMAKE_INSTALL_PREFIX "/nfs/fanae/PoD_releases/PoD-master.slc6" CACHE PATH "Install path prefix, prepended onto install directories." FORCE)
[iglez@fanae128 PoD_releases]$ mkdir PoD-master.slc6
[iglez@fanae128 PoD_releases]$ cd PoD-master-Source/
[iglez@fanae128 PoD-master-Source]$ mkdir build
[iglez@fanae128 PoD-master-Source]$ cd build/
[iglez@fanae128 build]$ cmake -C ../BuildSetup.cmake ..
loading initial cache file ../BuildSetup.cmake
-- The C compiler identification is GNU 5.3.0
-- The CXX compiler identification is GNU 5.3.0
-- Check for working C compiler: /cvmfs/cms.cern.ch/slc6_amd64_gcc530/external/gcc/5.3.0/bin/cc
-- Check for working C compiler: /cvmfs/cms.cern.ch/slc6_amd64_gcc530/external/gcc/5.3.0/bin/cc -- works
-- Detecting C compiler ABI info
-- Detecting C compiler ABI info - done
-- Detecting C compile features
-- Detecting C compile features - done
-- Check for working CXX compiler: /cvmfs/cms.cern.ch/slc6_amd64_gcc530/external/gcc/5.3.0/bin/c++
-- Check for working CXX compiler: /cvmfs/cms.cern.ch/slc6_amd64_gcc530/external/gcc/5.3.0/bin/c++ -- works
-- Detecting CXX compiler ABI info
-- Detecting CXX compiler ABI info - done
-- Detecting CXX compile features
-- Detecting CXX compile features - done
-- Found Doxygen: /usr/bin/doxygen (found version "1.6.1")
-- Using BOOST Library dir:
-- Boost version: 1.41.0
-- Looking for pthread.h
-- Looking for pthread.h - found
-- Looking for pthread_create
-- Looking for pthread_create - not found
-- Looking for pthread_create in pthreads
-- Looking for pthread_create in pthreads - not found
-- Looking for pthread_create in pthread
-- Looking for pthread_create in pthread - found
-- Found Threads: TRUE
-- Boost version: 1.41.0
-- Found the following Boost libraries:
-- thread
-- program_options
-- filesystem
-- system
-- unit_test_framework
-- date_time
-- Build the pipe_log_engine lib - YES
-- Build the pod_protocol lib - YES
-- Build the proof_status_file lib - YES
-- Build the SSHTunnel lib - YES
-- Build the pod_sys_files lib - YES
-- Build MiscCommon unit tests - YES
-- Build pod-agent - YES
-- Build pod-agent unit tests - YES
-- Build pod-info - YES
-- Build pod-remote - YES
-- Build pod-user-defaults - YES
-- Build pod-ssh - YES
-- Build pod-ssh unit tests - YES
-- Configuring done
-- Generating done
-- Build files have been written to: /nfs/fanae/PoD_releases/PoD-master-Source/build
[iglez@fanae128 build]$ make -j
Scanning dependencies of target proof_status_file
Scanning dependencies of target MiscCommon_test_MiscUtils
Scanning dependencies of target pod_sys_files
Scanning dependencies of target pod_protocol
Scanning dependencies of target pod-ssh_test_config
Scanning dependencies of target MiscCommon_test_SysHelper
Scanning dependencies of target pod-ssh_test_threadpool
Scanning dependencies of target MiscCommon_test_Process
Scanning dependencies of target SSHTunnel
Scanning dependencies of target MiscCommon_test_FindCfgFile
Scanning dependencies of target pipe_log_engine
Scanning dependencies of target pod-user-defaults
[ 9%] Building CXX object app/MiscCommon/proof_status_file/CMakeFiles/proof_status_file.dir/ProofStatusFile.cpp.o
[ 9%] Building CXX object app/MiscCommon/tests/CMakeFiles/MiscCommon_test_FindCfgFile.dir/Test_FindCfgFile.cpp.o
[ 9%] Building CXX object app/MiscCommon/tests/CMakeFiles/MiscCommon_test_SysHelper.dir/Test_SysHelper.cpp.o
[ 9%] Building CXX object app/MiscCommon/tests/CMakeFiles/MiscCommon_test_Process.dir/Test_Process.cpp.o
[ 9%] Building CXX object app/MiscCommon/tests/CMakeFiles/MiscCommon_test_MiscUtils.dir/Test_MiscUtils.cpp.o
[ 25%] Building CXX object app/MiscCommon/pipe_log_engine/CMakeFiles/pipe_log_engine.dir/logEngine.cpp.o
[ 25%] Building CXX object app/pod-ssh/tests/CMakeFiles/pod-ssh_test_threadpool.dir/test_threadpool.cpp.o
[ 25%] Building CXX object app/pod-ssh/tests/CMakeFiles/pod-ssh_test_config.dir/test_config.cpp.o
[ 25%] Building CXX object app/MiscCommon/pod_protocol/CMakeFiles/pod_protocol.dir/ProtocolCommands.cpp.o
[ 25%] Building CXX object app/MiscCommon/pod_sys_files/CMakeFiles/pod_sys_files.dir/PoDSysFiles.cpp.o
[ 25%] Building CXX object app/MiscCommon/SSHTunnel/CMakeFiles/SSHTunnel.dir/SSHTunnel.cpp.o
[ 25%] Building CXX object app/MiscCommon/pod_protocol/CMakeFiles/pod_protocol.dir/Protocol.cpp.o
[ 25%] Building CXX object app/pod-ssh/tests/CMakeFiles/pod-ssh_test_config.dir/__/src/config.cpp.o
[ 26%] Building CXX object app/pod-user-defaults/CMakeFiles/pod-user-defaults.dir/src/main.cpp.o
[ 28%] Linking CXX shared library libSSHTunnel.so
[ 30%] Linking CXX shared library libpod_protocol.so
[ 30%] Built target SSHTunnel
[ 32%] Linking CXX executable MiscCommon_test_MiscUtils
[ 34%] Linking CXX executable pod-ssh_test_config
[ 34%] Built target pod_protocol
Scanning dependencies of target pod-agent_test_Protocol
Scanning dependencies of target pod-agent_test_ProtocolCommands
[ 36%] Linking CXX executable MiscCommon_test_FindCfgFile
[ 44%] Linking CXX shared library libpipe_log_engine.so
[ 44%] Building CXX object app/pod-agent/tests/CMakeFiles/pod-agent_test_ProtocolCommands.dir/Test_ProtocolCommands.cpp.o
[ 44%] Building CXX object app/pod-agent/tests/CMakeFiles/pod-agent_test_Protocol.dir/Test_Protocol.cpp.o
[ 44%] Linking CXX executable pod-ssh_test_threadpool
[ 44%] Built target MiscCommon_test_MiscUtils
[ 44%] Built target pod-ssh_test_config
[ 44%] Built target MiscCommon_test_FindCfgFile
[ 44%] Built target pod-ssh_test_threadpool
[ 44%] Built target pipe_log_engine
[ 46%] Linking CXX executable MiscCommon_test_SysHelper
[ 46%] Built target MiscCommon_test_SysHelper
[ 48%] Linking CXX shared library libproof_status_file.so
[ 48%] Built target proof_status_file
Scanning dependencies of target pod-agent
Scanning dependencies of target pod-agent_test_ProofStatusFile
[ 50%] Linking CXX executable MiscCommon_test_Process
[ 51%] Building CXX object app/pod-agent/tests/CMakeFiles/pod-agent_test_ProofStatusFile.dir/Test_ProofStatusFile.cpp.o
[ 57%] Building CXX object app/pod-agent/CMakeFiles/pod-agent.dir/src/AgentServer.cpp.o
[ 57%] Building CXX object app/pod-agent/CMakeFiles/pod-agent.dir/src/AgentBase.cpp.o
[ 57%] Building CXX object app/pod-agent/CMakeFiles/pod-agent.dir/src/Main.cpp.o
[ 61%] Building CXX object app/pod-agent/CMakeFiles/pod-agent.dir/src/AgentClient.cpp.o
[ 61%] Building CXX object app/pod-agent/CMakeFiles/pod-agent.dir/src/ThreadPool.cpp.o
[ 63%] Building CXX object app/pod-agent/CMakeFiles/pod-agent.dir/src/Node.cpp.o
[ 65%] Building CXX object app/pod-agent/CMakeFiles/pod-agent.dir/src/PROOFAgent.cpp.o
[ 65%] Built target MiscCommon_test_Process
[ 67%] Linking CXX executable pod-agent_test_Protocol
[ 67%] Built target pod-agent_test_Protocol
[ 69%] Linking CXX executable pod-agent_test_ProtocolCommands
[ 69%] Built target pod-agent_test_ProtocolCommands
[ 71%] Linking CXX executable pod-agent_test_ProofStatusFile
[ 73%] Linking CXX shared library libpod_sys_files.so
[ 73%] Built target pod-agent_test_ProofStatusFile
[ 73%] Built target pod_sys_files
Scanning dependencies of target pod-info
Scanning dependencies of target pod-ssh
Scanning dependencies of target pod-remote
[ 75%] Building CXX object app/pod-info/CMakeFiles/pod-info.dir/src/main.cpp.o
[ 78%] Building CXX object app/pod-info/CMakeFiles/pod-info.dir/src/Server.cpp.o
[ 78%] Building CXX object app/pod-info/CMakeFiles/pod-info.dir/src/SrvInfo.cpp.o
[ 86%] Building CXX object app/pod-ssh/CMakeFiles/pod-ssh.dir/src/main.cpp.o
[ 88%] Building CXX object app/pod-remote/CMakeFiles/pod-remote.dir/src/main.cpp.o
[ 90%] Building CXX object app/pod-remote/CMakeFiles/pod-remote.dir/src/MessageParser.cpp.o
[ 90%] Building CXX object app/pod-ssh/CMakeFiles/pod-ssh.dir/src/config.cpp.o
[ 90%] Building CXX object app/pod-ssh/CMakeFiles/pod-ssh.dir/src/worker.cpp.o
[ 90%] Building CXX object app/pod-remote/CMakeFiles/pod-remote.dir/src/Utils.cpp.o
[ 92%] Linking CXX executable pod-user-defaults
[ 92%] Built target pod-user-defaults
[ 94%] Linking CXX executable pod-agent
[ 94%] Built target pod-agent
[ 96%] Linking CXX executable pod-info
[ 96%] Built target pod-info
[ 98%] Linking CXX executable pod-remote
[ 98%] Built target pod-remote
[100%] Linking CXX executable pod-ssh
[100%] Built target pod-ssh
[iglez@fanae128 build]$ make -j wn_bin
[ 44%] Built target SSHTunnel
[ 44%] Built target pod-user-defaults
[ 50%] Built target proof_status_file
[ 50%] Built target pod_protocol
[ 94%] Built target pod-agent
Scanning dependencies of target wn_bin
[100%] Generate WN binary package
[100%] Built target wn_bin
[iglez@fanae128 build]$ make -j install
[ 30%] Built target SSHTunnel
[ 34%] Built target pod-ssh_test_threadpool
[ 48%] Built target pipe_log_engine
[ 48%] Built target MiscCommon_test_MiscUtils
[ 48%] Built target pod_sys_files
[ 48%] Built target MiscCommon_test_FindCfgFile
[ 50%] Built target pod_protocol
[ 50%] Built target pod-user-defaults
[ 50%] Built target MiscCommon_test_Process
[ 50%] Built target proof_status_file
[ 50%] Built target MiscCommon_test_SysHelper
[ 50%] Built target pod-ssh_test_config
[ 76%] Built target pod-agent_test_Protocol
[ 76%] Built target pod-agent_test_ProofStatusFile
[ 76%] Built target pod-agent_test_ProtocolCommands
[ 88%] Built target pod-ssh
[ 92%] Built target pod-info
[ 92%] Built target pod-remote
[100%] Built target pod-agent
Install the project...
-- Install configuration: "Release"
-- Installing: /nfs/fanae/PoD_releases/PoD-master.slc6/./LICENSE
-- Installing: /nfs/fanae/PoD_releases/PoD-master.slc6/./ReleaseNotes
-- Installing: /nfs/fanae/PoD_releases/PoD-master.slc6/etc/xpd.cf.in
-- Installing: /nfs/fanae/PoD_releases/PoD-master.slc6/./PoD_env.sh
-- Installing: /nfs/fanae/PoD_releases/PoD-master.slc6/etc/Job.pbs.in
-- Installing: /nfs/fanae/PoD_releases/PoD-master.slc6/etc/Job.lsf.in
-- Installing: /nfs/fanae/PoD_releases/PoD-master.slc6/etc/Job.ge.in
-- Installing: /nfs/fanae/PoD_releases/PoD-master.slc6/etc/Job.condor
-- Installing: /nfs/fanae/PoD_releases/PoD-master.slc6/etc/Job.condor.option
-- Installing: /nfs/fanae/PoD_releases/PoD-master.slc6/etc/Job.slurm
-- Installing: /nfs/fanae/PoD_releases/PoD-master.slc6/etc/PoDWorker.sh.in
-- Installing: /nfs/fanae/PoD_releases/PoD-master.slc6/bin/pod-server
-- Installing: /nfs/fanae/PoD_releases/PoD-master.slc6/bin/pod-check-update
-- Installing: /nfs/fanae/PoD_releases/PoD-master.slc6/bin/pod-prep-worker
-- Installing: /nfs/fanae/PoD_releases/PoD-master.slc6/bin/pod-submit
-- Installing: /nfs/fanae/PoD_releases/PoD-master.slc6/bin/private/pod-ssh-submit-worker
-- Installing: /nfs/fanae/PoD_releases/PoD-master.slc6/bin/private/pod-ssh-clean-worker
-- Installing: /nfs/fanae/PoD_releases/PoD-master.slc6/bin/private/pod-ssh-status-worker
-- Installing: /nfs/fanae/PoD_releases/PoD-master.slc6/bin/private/pod-ssh-exec-worker
-- Installing: /nfs/fanae/PoD_releases/PoD-master.slc6/bin/private/pod-ssh-keygen
-- Installing: /nfs/fanae/PoD_releases/PoD-master.slc6/bin/private/pod-remote-srv-info
-- Installing: /nfs/fanae/PoD_releases/PoD-master.slc6/bin/private/pod-addpayload
-- Installing: /nfs/fanae/PoD_releases/PoD-master.slc6/etc/gLite.jdl
-- Installing: /nfs/fanae/PoD_releases/PoD-master.slc6/etc/version
-- Installing: /nfs/fanae/PoD_releases/PoD-master.slc6/plugins/cli/pod-lsf-submit
-- Installing: /nfs/fanae/PoD_releases/PoD-master.slc6/plugins/cli/pod-pbs-submit
-- Installing: /nfs/fanae/PoD_releases/PoD-master.slc6/plugins/cli/pod-ge-submit
-- Installing: /nfs/fanae/PoD_releases/PoD-master.slc6/plugins/cli/pod-condor-submit
-- Installing: /nfs/fanae/PoD_releases/PoD-master.slc6/plugins/cli/pod-loadleveler-submit
-- Installing: /nfs/fanae/PoD_releases/PoD-master.slc6/plugins/cli/pod-glite-submit
-- Installing: /nfs/fanae/PoD_releases/PoD-master.slc6/plugins/cli/pod-panda-submit
-- Installing: /nfs/fanae/PoD_releases/PoD-master.slc6/plugins/cli/pod-slurm-submit
-- Installing: /nfs/fanae/PoD_releases/PoD-master.slc6/tests/run_test.sh
-- Installing: /nfs/fanae/PoD_releases/PoD-master.slc6/lib/libpipe_log_engine.so
-- Installing: /nfs/fanae/PoD_releases/PoD-master.slc6/lib/libpod_protocol.so
-- Installing: /nfs/fanae/PoD_releases/PoD-master.slc6/lib/libproof_status_file.so
-- Installing: /nfs/fanae/PoD_releases/PoD-master.slc6/lib/libSSHTunnel.so
-- Installing: /nfs/fanae/PoD_releases/PoD-master.slc6/bin/private/ssh-tunnel
-- Installing: /nfs/fanae/PoD_releases/PoD-master.slc6/lib/libpod_sys_files.so
-- Installing: /nfs/fanae/PoD_releases/PoD-master.slc6/tests/MiscCommon_test_MiscUtils
-- Installing: /nfs/fanae/PoD_releases/PoD-master.slc6/tests/MiscCommon_test_Process
-- Installing: /nfs/fanae/PoD_releases/PoD-master.slc6/tests/MiscCommon_test_SysHelper
-- Installing: /nfs/fanae/PoD_releases/PoD-master.slc6/tests/MiscCommon_test_FindCfgFile
-- Installing: /nfs/fanae/PoD_releases/PoD-master.slc6/bin/pod-agent
-- Set runtime path of "/nfs/fanae/PoD_releases/PoD-master.slc6/bin/pod-agent" to ""
-- Installing: /nfs/fanae/PoD_releases/PoD-master.slc6/tests/pod-agent_test_ProtocolCommands
-- Set runtime path of "/nfs/fanae/PoD_releases/PoD-master.slc6/tests/pod-agent_test_ProtocolCommands" to ""
-- Installing: /nfs/fanae/PoD_releases/PoD-master.slc6/tests/pod-agent_test_Protocol
-- Set runtime path of "/nfs/fanae/PoD_releases/PoD-master.slc6/tests/pod-agent_test_Protocol" to ""
-- Installing: /nfs/fanae/PoD_releases/PoD-master.slc6/tests/pod-agent_test_ProofStatusFile
-- Set runtime path of "/nfs/fanae/PoD_releases/PoD-master.slc6/tests/pod-agent_test_ProofStatusFile" to ""
-- Installing: /nfs/fanae/PoD_releases/PoD-master.slc6/tests/xpd.cf
-- Installing: /nfs/fanae/PoD_releases/PoD-master.slc6/bin/pod-info
-- Set runtime path of "/nfs/fanae/PoD_releases/PoD-master.slc6/bin/pod-info" to ""
-- Installing: /nfs/fanae/PoD_releases/PoD-master.slc6/bin/pod-remote
-- Set runtime path of "/nfs/fanae/PoD_releases/PoD-master.slc6/bin/pod-remote" to ""
-- Installing: /nfs/fanae/PoD_releases/PoD-master.slc6/bin/pod-user-defaults
-- Installing: /nfs/fanae/PoD_releases/PoD-master.slc6/bin/pod-ssh
-- Set runtime path of "/nfs/fanae/PoD_releases/PoD-master.slc6/bin/pod-ssh" to ""
-- Installing: /nfs/fanae/PoD_releases/PoD-master.slc6/tests/pod-ssh_test_config
-- Installing: /nfs/fanae/PoD_releases/PoD-master.slc6/tests/pod-ssh_test_threadpool
[iglez@fanae128 build]$ source ../../PoD-master.slc6/PoD_env.sh
[iglez@fanae128 build]$ pod-server start
WNs pre-compiled binaries are missing.
Downloading WNs pre-compiled binaries...
Error: Can't download pre-compiled binaries for WNs.
Please check availability of http://pod.gsi.de/releases/add/3.17.2.g2ebb/pod-wrk-bin-3.17.2.g2ebb-Linux-x86.tar.gz
[iglez@fanae128 build]$ pod-server getbins
WNs pre-compiled binaries are missing.
Downloading WNs pre-compiled binaries...
Error: Can't download pre-compiled binaries for WNs.
Please check availability of http://pod.gsi.de/releases/add/3.17.2.g2ebb/pod-wrk-bin-3.17.2.g2ebb-Linux-x86.tar.gz
I am surely missing something here. Cheers, Isidro
Unfortunately I don't have the build environment for PoD anymore and can't build wn packages. I can recover the env. It shouldn't be a problem, but it will take some time. So, let's try to resolve it online, it should be faster.
Can you please try to build and install everything only for you (for your local user)? Like let's try it for one user only. If this will work, we will go further and will find out how to make a shared installation.
Hi Anar, Sorry for the late answer (easter holidays). It is a pitty you don't have a build environment. I tried to follow the instructions for a user installation.
mkdir build
cd build/
cmake -C ../BuildSetup.cmake ..
make -j
make -j wn_bin
make -j install
And then I tried to start the server...
$ source ~/PoD/3.17.2.g2ebb/PoD_env.sh
$ pod-server start
WNs pre-compiled binaries are missing.
Downloading WNs pre-compiled binaries...
Error: Can't download pre-compiled binaries for WNs.
Please check availability of http://pod.gsi.de/releases/add/3.17.2.g2ebb/pod-wrk-bin-3.17.2.g2ebb-Linux-x86.tar.gz
with no success... same error as above!
If I list the content of the build directory I can see the pod-wrk-bin-3.17.2.g2ebb-Linux-amd64.tar.gz
, but I cannot find it in the installation path. So my guess is that make install
is not moving it to the right place where it is then expected.
Let me know if there is you need some more details.
Addition
Even if I manually copy the tar.gz to the right place $POD_INSTALLATION/bin/wn_bins
, the pod-server
script will look for the other architectures and try to download them if they are not found.
Hi @AnarManafov , Thinking a bit about this, it is strange that the system requires the worker binaries for all architectures, isn't it? Specially if I have a homogeneous cluster (which might not be the case).
Isidro
It is kinda a limitation of PoD. It always require wn bins for all supported platforms. Because I used to auto build packages for every release (even nightly).
It is easy to workaround. But first of all, can you check the content of $POD_LOCATION/bin/wn_bins ? Does it exist and what is inside?
Hi @AnarManafov , I see... Concerning your question. The folder exits. But it is empty. Cheers, Isidro
Ok, give me a moment to check something...
Ok, I got it. I was confusing PoD with its successor DDS (http://dds.gsi.de). DDS is much smarter and manages such things automatically.
Anyway, can you please try the following. If it works, let me know and I will adjust scripts to do it automatically.
Hi again, Just tested. Few things: 1) I have to correct myself. I would swear that the $POD_LOCATION/bin/wn_bins was automatically created. But it wasn't in my test now. So it may well be that I created it manually on some of my previous tests. 2) Following your instructions I got it working... 3) Even if it works, I get an error when I execute pod-submit
$ pod-submit -r pbs -n 10 -q proof
PoDWorker.sh
/nfs/fanae/PoD_releases/PoD-master-test/build
qsub: submit error (Invalid request MSG=cannot locate new job 3933[].gae011.ciencias.uniovi.es (0 - Success))
Error submitting job.
don't get confused by the error. The job is there and pod-info is able to connect to the workers.
$ qstat
Job ID Name User Time Use S Queue
------------------------- ---------------- --------------- -------- - -----
3933[].gae011.ciencias.uniovi pod iglez 0 R proof
$ pod-info -n
10
4) Is DDS ready to be used? If so, how difficult is the migration?
Thanks!
Isidro
DDS is in production state. It is used by ALICE (AlFa project). Unfortunately there is no integration with PROOF. It has been discussed multiple times, but no one volontiert to implement the support. My dev. estimation is about 1-2 man/weeks. It can be done even by a student. I proposed to ROOT team as well, but they seem not to be interested to keep PROOF alive. What can I do then :( DDS is very easy to use. But unlike PoD it doesn't know about tasks it starts, so PROOF needs to be adjusted to start using DDS. As for the users, it would be much easier to use DDS, than PoD. Much easier.
I see... unfortunatelly I have no time in the near future to help here Anar. Sorry about that :( It is a pitty they are dropping support on PROOF.
Concerning PoD, now that the issues are identified, how to set it up globally for all the users in my lab?
Reg. PROOF. Yes, that is very unfortunate that ROOT is slowly dropping PROOF. I think PROOF is very handy for end users. When you need to run quickly your analysis on several machines.
Reg. This issue in PoD. Just try to copy this (your PoD) installation as it is in a shred location. It should do the job.
Let me know if something comes up.
With Torque-4.2.7, qsub has a function in qsub which detects if \r (or ^M) exists in the submission file.
As such, with PoD,
desilva@melui1:~$ pod-submit -r pbs -n 2 -q mel_short PoDWorker.sh ~ qsub: script is written in DOS/Windows text format Error submitting job.
To reproduce:
setupATLAS localSetupPoD PoD-3.16p1-python2.7-x86_64-slc6-gcc47-boost1.55 pod-server start qstat -q pod-submit -r pbs -n 2 -q mel_short
regard, Asoka