AnarManafov / PoD

http://pod.gsi.de
GNU General Public License v2.0
4 stars 3 forks source link

Torque-4.2.7 and pod-submit failing #1

Closed desilva2185 closed 7 years ago

desilva2185 commented 10 years ago

With Torque-4.2.7, qsub has a function in qsub which detects if \r (or ^M) exists in the submission file.

As such, with PoD,

desilva@melui1:~$ pod-submit -r pbs -n 2 -q mel_short PoDWorker.sh ~ qsub: script is written in DOS/Windows text format Error submitting job.

To reproduce:

setupATLAS localSetupPoD PoD-3.16p1-python2.7-x86_64-slc6-gcc47-boost1.55 pod-server start qstat -q pod-submit -r pbs -n 2 -q mel_short

regard, Asoka

AnarManafov commented 10 years ago

I think I saw this ones and got it fixed somehow. I guess, the root of the problem is a multilevel binary attachment, which PoD injects in the body of the job script.

I will look for a possibility to silence PBS.

AnarManafov commented 10 years ago

In a minute I will build a version with this fix for you to test.

desilva2185 commented 10 years ago

Thanks !

regards, Asoka

On Jul 3, 2014, at 9:33 AM, Anar Manafov notifications@github.com wrote:

In a minute I will build a version with this fix for you test.

— Reply to this email directly or view it on GitHub.


Adr: ATLAS Tier-1, TRIUMF, 4004 Wesbrook Mall, Vancouver B.C. V6T 2A3, Canada Url: http://trshare.triumf.ca/~desilva/Personal Tel: (604) 222-7496

Fax: (604) 222-1074

AnarManafov commented 10 years ago

Can you please try this version to make sure it works on your Torque env? http://pod.gsi.de/releases/pod/nightly/PoD-3.16.1.g43df-Source.tar.gz

Please let me know whether it fixes the issue. Feel free to re-open if issue persists.

desilva2185 commented 10 years ago

Ok, will try and let you know but it may be next week (just came back and am swamped.)

regards, Asoka

On Jul 3, 2014, at 9:50 AM, Anar Manafov notifications@github.com wrote:

Can you please try this version to make sure it works on your Torque env? http://pod.gsi.de/releases/pod/nightly/PoD-3.16.1.g43df-Source.tar.gz

Please let me know whether it fixes the issue. Feel free to re-open if issue persists.

— Reply to this email directly or view it on GitHub.


Adr: ATLAS Tier-1, TRIUMF, 4004 Wesbrook Mall, Vancouver B.C. V6T 2A3, Canada Url: http://trshare.triumf.ca/~desilva/Personal Tel: (604) 222-7496

Fax: (604) 222-1074

AnarManafov commented 10 years ago

Sounds good, Asoka! Meanwhile I will investigate the other issue you reported.

desilva2185 commented 10 years ago

Hi Anar,

Finally got the chance to test this. Apparently this depends on uuencode being available and this is not installed by default on vanilla SL6 OS.

Some sites have it (lxplus, TRIUMF T1 and T3) while others do not (Australia testbed and my testbed).

Is there an alternative or is this a requirement ?

Thanks !

regards, Asoka

iglezh commented 7 years ago

Hi Anar, I have hit this problem with my latest installation of Torque/Maui... is the solution you propose still valid or is there something better? I am using PoD 3.16.

Isidro

AnarManafov commented 7 years ago

Hi Isidro,

I haven't had a chance to investigate other solutions. Please try this one. If won't help, we will figure something else out.

iglezh commented 7 years ago

Hi Anar, It seems to work provided I install uuencode in all my nodes (through the sharutils package in SLC6). I am now facing some other problem when submitting with the following error message:

qsub: submit error (Invalid request MSG=cannot locate new job 17[].xxxxx.uniovi.es (0 - Success))

No idea why. Cheers, Isidro

iglezh commented 7 years ago

Hi Anar, Actually I traced the problem to some issue with PoD. It seems the pod-worker file is no where in my installation (or in my area) and it fails when the submitted script tries to copy it. Any hint?

Update If I understand correctly what I just reported is not a problem, but a way to check wether we are using shared folders. Going further down in the log it seems there is some issue with tar:

Writing files in node's directory  /tmp/iglez/PoD_2Cq4hsLUrd
cp: cannot stat `/mnt_pool/fanae105/user/iglez/.PoD/wrk/pod-worker': No such file or directory
uudecode: stdin: Short file

gzip: stdin: unexpected end of file
PoDWorker.sh
tar: Unexpected EOF in archive
tar: Unexpected EOF in archive
tar: Error is not recoverable: exiting now
...
...
***     [Fri, 24 Mar 2017 17:39:19 +0100]       +++ PoD Worker START +++
***     [Fri, 24 Mar 2017 17:39:19 +0100]       Current working directory: /tmp/iglez/PoD_2Cq4hsLUrd
***     [Fri, 24 Mar 2017 17:39:19 +0100]       Untar payload...
uudecode: stdin: Short file

gzip: stdin: unexpected end of file
xpd.cf
PoD.cfg
version
server_info.cfg
pod-wrk-bin-3.16.1.g43df-Darwin-universal.tar.gz
tar: Unexpected EOF in archive
tar: Unexpected EOF in archive
tar: Error is not recoverable: exiting now
***     [Fri, 24 Mar 2017 17:39:19 +0100]       host's CPU/instruction set: amd64
***     [Fri, 24 Mar 2017 17:39:19 +0100]       PoD worker runs on Linux-x86_64
***     [Fri, 24 Mar 2017 17:39:19 +0100]       Error: Can't find WN pre-compiled bin.: /tmp/iglez/PoD_2Cq4hsLUrd/pod-wrk-bin-3.16.1.g43df-Linux-amd64.tar.gz
***     [Fri, 24 Mar 2017 17:39:19 +0100]       Starting the cleaning procedure...
***     [Fri, 24 Mar 2017 17:39:19 +0100]       Gracefully shut down PoD worker process(es):  
***     [Fri, 24 Mar 2017 17:39:19 +0100]       done cleaning up.

So, it may be that tar and uuencode/uudecode are not working properly?

Isidro

AnarManafov commented 7 years ago

Is this folder "/mnt_pool/fanae105/user/iglez" actually shared between the submit host and WNs on your PBS?

If yes, can you please pack and send me your ~/.PoD dir?

iglezh commented 7 years ago

Hi Anar, Yep. That is the home folder and it is shared. Actually the problem seems to be (if I traced it back properly) to the uudecode call which seems not to be able to deal with the payload. I added "set -x" here and there in the scripts to get the maximum output when they are run. I have just send you by mail the .PoD folder (tar.gz) and the output of one of the jobs failing.

Cheers,

Isidro

AnarManafov commented 7 years ago

Isidro, man, can you please send me the email with the archive again? I lost it somehow. Can't find it anymore. :(

iglezh commented 7 years ago

Hi Anar,

Just resend it a minute ago. Let me know if you recieve it. Thanks a lot for your help!

Isidro


From: Anar Manafov [notifications@github.com] Sent: 30 March 2017 14:05 To: AnarManafov/PoD Cc: Isidro Gonzalez Caballero; Comment Subject: Re: [AnarManafov/PoD] Torque-4.2.7 and pod-submit failing (#1)

Isidro, man, can you please send me the email with the archive again? I lost it somehow. Can't find it anymore. :(

— You are receiving this because you commented. Reply to this email directly, view it on GitHubhttps://github.com/AnarManafov/PoD/issues/1#issuecomment-290390811, or mute the threadhttps://github.com/notifications/unsubscribe-auth/AFl1bY506ogCvWnD-UfRuRQ872uIq3d5ks5rq5qUgaJpZM4CH2M3.

AnarManafov commented 7 years ago

Isidro, just for my info. Did you try the current master or a tagged 3.16?

iglezh commented 7 years ago

Hi Anar, I tried the version you suggested above from this link:

http://pod.gsi.de/releases/pod/nightly/PoD-3.16.1.g43df-Source.tar.gz

Adding set -x in many scripts to get a very verbose output. And added a line at the end of the payload after encoding because in a couple of trials I did I could not decode the file otherwise. Without modifications it did not work either.

Cheers,

Isidro

AnarManafov commented 7 years ago

Ok, thanks. I think I found the bug. Looking for a proper fix now...

AnarManafov commented 7 years ago

Isidro, please try the current master. The issue should be fixed by d3f67dd6db11d03fe2db05835714f635c6cad60a.

Feel free to reopen if the issue persists.

iglezh commented 7 years ago

Hi Anar, Now I am having problems compiling. Actually the problem seems to come from cmake. I have tried three different versions of cmake with similar results. The output is

loading initial cache file ../BuildSetup.cmake
CMake Warning (dev) in CMakeLists.txt:
  Syntax Warning in cmake code at

    /nfs/fanae/PoD_releases/PoD-master-Source/CMakeLists.txt:102:28

  Argument not separated from preceding token by whitespace.
This warning is for project developers.  Use -Wno-dev to suppress it.

CMake Warning (dev) in CMakeLists.txt:
  Syntax Warning in cmake code at

    /nfs/fanae/PoD_releases/PoD-master-Source/CMakeLists.txt:105:28

  Argument not separated from preceding token by whitespace.
This warning is for project developers.  Use -Wno-dev to suppress it.

-- The C compiler identification is GNU 5.3.0
-- The CXX compiler identification is GNU 5.3.0
-- Check for working C compiler: /cvmfs/cms.cern.ch/slc6_amd64_gcc530/external/gcc/5.3.0/bin/cc
-- Check for working C compiler: /cvmfs/cms.cern.ch/slc6_amd64_gcc530/external/gcc/5.3.0/bin/cc -- works
-- Detecting C compiler ABI info
-- Detecting C compiler ABI info - done
-- Check for working CXX compiler: /cvmfs/cms.cern.ch/slc6_amd64_gcc530/external/gcc/5.3.0/bin/c++
-- Check for working CXX compiler: /cvmfs/cms.cern.ch/slc6_amd64_gcc530/external/gcc/5.3.0/bin/c++ -- works
-- Detecting CXX compiler ABI info
-- Detecting CXX compiler ABI info - done
-- Found Doxygen: /usr/bin/doxygen (found version "1.6.1") 
CMAKE VERSION 2.8.12.2
-- Boost version: 1.41.0
-- Boost version: 1.41.0
-- Found the following Boost libraries:
--   thread
--   program_options
--   filesystem
--   system
--   unit_test_framework
CMake Warning (dev) at CMakeLists.txt:260 (add_subdirectory):
  The source directory

    /nfs/fanae/PoD_releases/PoD-master-Source/app/MiscCommon

  does not contain a CMakeLists.txt file.

  CMake does not support this case but it used to work accidentally and is
  being allowed for compatibility.

  Policy CMP0014 is not set: Input directories must have CMakeLists.txt.  Run
  "cmake --help-policy CMP0014" for policy details.  Use the cmake_policy
  command to set the policy and suppress this warning.
This warning is for project developers.  Use -Wno-dev to suppress it.

-- Build pod-agent - YES
-- Build pod-agent unit tests - YES
-- Build pod-info - YES
-- Build pod-remote - YES
-- Build pod-user-defaults - YES
-- Build pod-ssh - YES
-- Build pod-ssh unit tests - YES
-- Configuring done
CMake Error at CMakeLists.txt:108 (add_custom_target):
  Error evaluating generator expression:

    $<TARGET_FILE:pod_protocol>

  No target "pod_protocol"

CMake Error at CMakeLists.txt:108 (add_custom_target):
  Error evaluating generator expression:

    $<TARGET_FILE:proof_status_file>

  No target "proof_status_file"

CMake Error at CMakeLists.txt:108 (add_custom_target):
  Error evaluating generator expression:

    $<TARGET_FILE:SSHTunnel>

  No target "SSHTunnel"

-- Generating done
-- Build files have been written to: /nfs/fanae/PoD_releases/PoD-master-Source/build
AnarManafov commented 7 years ago

I have update came cfg to support modern versions of cmake.

Also, you forgot to execute "git submodule update --init". This is the reason for:

/nfs/fanae/PoD_releases/PoD-master-Source/app/MiscCommon

  does not contain a CMakeLists.txt file.
iglezh commented 7 years ago

Hi again Anar, Thanks a lot for the quick answer! Now everything works clean and smooth (no warnings) in the building. BTW, I missed in the documentation the git submodule... command.

Now, the next step is getting the binaries where it fails :(

$ pod-server getbins
WNs pre-compiled binaries are missing.
Downloading WNs pre-compiled binaries...
Error: Can't download pre-compiled binaries for WNs.
Please check availability of  http://pod.gsi.de/releases/add/3.17.2.g2ebb/pod-wrk-bin-3.17.2.g2ebb-Linux-x86.tar.gz

I guess they are in http://pod.gsi.de/releases/add/3.17/, so probably is just a question of changing etc/version. Right?

Cheers,

Isidro

AnarManafov commented 7 years ago

since you build from source you also need to build WN bins. This is pretty heaven feature (need to document :) ).

make -j wn_bin
make -j install
iglezh commented 7 years ago

Hi Anar again, There is still something missing. After executing those two commands (I even retried everything from the beginning) I get errors:

[iglez@fanae128 build]$ make -j wn_bin
[ 30%] [ 38%] [ 38%] Built target SSHTunnel
Built target proof_status_file
Built target pod-user-defaults
[ 38%] Built target pod_protocol
[ 92%] Built target pod-agent
[100%] Generate WN binary package
[100%] Built target wn_bin
[iglez@fanae128 build]$ pod-server getbins
WNs pre-compiled binaries are missing.
Downloading WNs pre-compiled binaries...
Error: Can't download pre-compiled binaries for WNs.
Please check availability of  http://pod.gsi.de/releases/add/3.17.2.g2ebb/pod-wrk-bin-3.17.2.g2ebb-Linux-x86.tar.gz
[iglez@fanae128 build]$ pod-server start
WNs pre-compiled binaries are missing.
Downloading WNs pre-compiled binaries...
Error: Can't download pre-compiled binaries for WNs.
Please check availability of  http://pod.gsi.de/releases/add/3.17.2.g2ebb/pod-wrk-bin-3.17.2.g2ebb-Linux-x86.tar.gz

I see the binary in the build directory:

[iglez@fanae128 build]$ ls
app             CMakeFiles           CPackConfig.cmake        install_manifest.txt  pod-wrk-bin
CMakeCache.txt  cmake_install.cmake  CPackSourceConfig.cmake  Makefile              pod-wrk-bin-3.17.2.g2ebb-Linux-amd64.tar.gz

But I cannot find it in the installation path. A configuration missing?

Cheers,

Isidro

AnarManafov commented 7 years ago

No problem, man!

the order is important. You have to build wn_bin and then install it. If you just build wn_bin target it won't get installed. You need to trigger the install target as well. So, try it as I posted above:

make -j wn_bin
make -j install
iglezh commented 7 years ago

Hi Anar, Thanks again for the fast answer. I was not clear in my previous post. But I had done that and it doesn't fix the problem. I posted the result so you see in the output it was executed (nothing to redo).

I even tried it from scratch: cmake + make + make wn_bin + make install with no success.

Isidro

AnarManafov commented 7 years ago

Indeed, my bad. I failed to check the rest of the message. Sorry for that.

The getbins argument is for only central installations, of released PoD versions. Since you built a version on your own, you should have wn_bin for your system already. So, just go ahead and try to start the server.

Please let me know the result. I don't have environment at the moment, can't check it myself. It has been long time since I touched PoD :(

iglezh commented 7 years ago

Hi Anar,

I am aiming at a central installation (/nfs/fanae/PoD_releases/PoD-master.slc6) from the source code (/nfs/fanae/PoD_releases/PoD-master-Source) since this is used by a few people on our farm combined with our framework (PAF). I followed your instructions on a clean new area and got again the same issues. Full log below:

[iglez@fanae128 PoD_releases]$  grep INSTALL PoD-master-Source/BuildSetup.cmake 
SET (CMAKE_INSTALL_PREFIX "/nfs/fanae/PoD_releases/PoD-master.slc6" CACHE PATH "Install path prefix, prepended onto install directories." FORCE)
[iglez@fanae128 PoD_releases]$ mkdir PoD-master.slc6
[iglez@fanae128 PoD_releases]$ cd PoD-master-Source/
[iglez@fanae128 PoD-master-Source]$  mkdir build
[iglez@fanae128 PoD-master-Source]$ cd build/
[iglez@fanae128 build]$ cmake -C ../BuildSetup.cmake ..
loading initial cache file ../BuildSetup.cmake
-- The C compiler identification is GNU 5.3.0
-- The CXX compiler identification is GNU 5.3.0
-- Check for working C compiler: /cvmfs/cms.cern.ch/slc6_amd64_gcc530/external/gcc/5.3.0/bin/cc
-- Check for working C compiler: /cvmfs/cms.cern.ch/slc6_amd64_gcc530/external/gcc/5.3.0/bin/cc -- works
-- Detecting C compiler ABI info
-- Detecting C compiler ABI info - done
-- Detecting C compile features
-- Detecting C compile features - done
-- Check for working CXX compiler: /cvmfs/cms.cern.ch/slc6_amd64_gcc530/external/gcc/5.3.0/bin/c++
-- Check for working CXX compiler: /cvmfs/cms.cern.ch/slc6_amd64_gcc530/external/gcc/5.3.0/bin/c++ -- works
-- Detecting CXX compiler ABI info
-- Detecting CXX compiler ABI info - done
-- Detecting CXX compile features
-- Detecting CXX compile features - done
-- Found Doxygen: /usr/bin/doxygen (found version "1.6.1") 
-- Using BOOST Library dir: 
-- Boost version: 1.41.0
-- Looking for pthread.h
-- Looking for pthread.h - found
-- Looking for pthread_create
-- Looking for pthread_create - not found
-- Looking for pthread_create in pthreads
-- Looking for pthread_create in pthreads - not found
-- Looking for pthread_create in pthread
-- Looking for pthread_create in pthread - found
-- Found Threads: TRUE  
-- Boost version: 1.41.0
-- Found the following Boost libraries:
--   thread
--   program_options
--   filesystem
--   system
--   unit_test_framework
--   date_time
-- Build the pipe_log_engine lib - YES
-- Build the pod_protocol lib - YES
-- Build the proof_status_file lib - YES
-- Build the SSHTunnel lib - YES
-- Build the pod_sys_files lib - YES
-- Build MiscCommon unit tests - YES
-- Build pod-agent - YES
-- Build pod-agent unit tests - YES
-- Build pod-info - YES
-- Build pod-remote - YES
-- Build pod-user-defaults - YES
-- Build pod-ssh - YES
-- Build pod-ssh unit tests - YES
-- Configuring done
-- Generating done
-- Build files have been written to: /nfs/fanae/PoD_releases/PoD-master-Source/build
[iglez@fanae128 build]$ make -j
Scanning dependencies of target proof_status_file
Scanning dependencies of target MiscCommon_test_MiscUtils
Scanning dependencies of target pod_sys_files
Scanning dependencies of target pod_protocol
Scanning dependencies of target pod-ssh_test_config
Scanning dependencies of target MiscCommon_test_SysHelper
Scanning dependencies of target pod-ssh_test_threadpool
Scanning dependencies of target MiscCommon_test_Process
Scanning dependencies of target SSHTunnel
Scanning dependencies of target MiscCommon_test_FindCfgFile
Scanning dependencies of target pipe_log_engine
Scanning dependencies of target pod-user-defaults
[  9%] Building CXX object app/MiscCommon/proof_status_file/CMakeFiles/proof_status_file.dir/ProofStatusFile.cpp.o
[  9%] Building CXX object app/MiscCommon/tests/CMakeFiles/MiscCommon_test_FindCfgFile.dir/Test_FindCfgFile.cpp.o
[  9%] Building CXX object app/MiscCommon/tests/CMakeFiles/MiscCommon_test_SysHelper.dir/Test_SysHelper.cpp.o
[  9%] Building CXX object app/MiscCommon/tests/CMakeFiles/MiscCommon_test_Process.dir/Test_Process.cpp.o
[  9%] Building CXX object app/MiscCommon/tests/CMakeFiles/MiscCommon_test_MiscUtils.dir/Test_MiscUtils.cpp.o
[ 25%] Building CXX object app/MiscCommon/pipe_log_engine/CMakeFiles/pipe_log_engine.dir/logEngine.cpp.o
[ 25%] Building CXX object app/pod-ssh/tests/CMakeFiles/pod-ssh_test_threadpool.dir/test_threadpool.cpp.o
[ 25%] Building CXX object app/pod-ssh/tests/CMakeFiles/pod-ssh_test_config.dir/test_config.cpp.o
[ 25%] Building CXX object app/MiscCommon/pod_protocol/CMakeFiles/pod_protocol.dir/ProtocolCommands.cpp.o
[ 25%] Building CXX object app/MiscCommon/pod_sys_files/CMakeFiles/pod_sys_files.dir/PoDSysFiles.cpp.o
[ 25%] Building CXX object app/MiscCommon/SSHTunnel/CMakeFiles/SSHTunnel.dir/SSHTunnel.cpp.o
[ 25%] Building CXX object app/MiscCommon/pod_protocol/CMakeFiles/pod_protocol.dir/Protocol.cpp.o
[ 25%] Building CXX object app/pod-ssh/tests/CMakeFiles/pod-ssh_test_config.dir/__/src/config.cpp.o
[ 26%] Building CXX object app/pod-user-defaults/CMakeFiles/pod-user-defaults.dir/src/main.cpp.o
[ 28%] Linking CXX shared library libSSHTunnel.so
[ 30%] Linking CXX shared library libpod_protocol.so
[ 30%] Built target SSHTunnel
[ 32%] Linking CXX executable MiscCommon_test_MiscUtils
[ 34%] Linking CXX executable pod-ssh_test_config
[ 34%] Built target pod_protocol
Scanning dependencies of target pod-agent_test_Protocol
Scanning dependencies of target pod-agent_test_ProtocolCommands
[ 36%] Linking CXX executable MiscCommon_test_FindCfgFile
[ 44%] Linking CXX shared library libpipe_log_engine.so
[ 44%] Building CXX object app/pod-agent/tests/CMakeFiles/pod-agent_test_ProtocolCommands.dir/Test_ProtocolCommands.cpp.o
[ 44%] Building CXX object app/pod-agent/tests/CMakeFiles/pod-agent_test_Protocol.dir/Test_Protocol.cpp.o
[ 44%] Linking CXX executable pod-ssh_test_threadpool
[ 44%] Built target MiscCommon_test_MiscUtils
[ 44%] Built target pod-ssh_test_config
[ 44%] Built target MiscCommon_test_FindCfgFile
[ 44%] Built target pod-ssh_test_threadpool
[ 44%] Built target pipe_log_engine
[ 46%] Linking CXX executable MiscCommon_test_SysHelper
[ 46%] Built target MiscCommon_test_SysHelper
[ 48%] Linking CXX shared library libproof_status_file.so
[ 48%] Built target proof_status_file
Scanning dependencies of target pod-agent
Scanning dependencies of target pod-agent_test_ProofStatusFile
[ 50%] Linking CXX executable MiscCommon_test_Process
[ 51%] Building CXX object app/pod-agent/tests/CMakeFiles/pod-agent_test_ProofStatusFile.dir/Test_ProofStatusFile.cpp.o
[ 57%] Building CXX object app/pod-agent/CMakeFiles/pod-agent.dir/src/AgentServer.cpp.o
[ 57%] Building CXX object app/pod-agent/CMakeFiles/pod-agent.dir/src/AgentBase.cpp.o
[ 57%] Building CXX object app/pod-agent/CMakeFiles/pod-agent.dir/src/Main.cpp.o
[ 61%] Building CXX object app/pod-agent/CMakeFiles/pod-agent.dir/src/AgentClient.cpp.o
[ 61%] Building CXX object app/pod-agent/CMakeFiles/pod-agent.dir/src/ThreadPool.cpp.o
[ 63%] Building CXX object app/pod-agent/CMakeFiles/pod-agent.dir/src/Node.cpp.o
[ 65%] Building CXX object app/pod-agent/CMakeFiles/pod-agent.dir/src/PROOFAgent.cpp.o
[ 65%] Built target MiscCommon_test_Process
[ 67%] Linking CXX executable pod-agent_test_Protocol
[ 67%] Built target pod-agent_test_Protocol
[ 69%] Linking CXX executable pod-agent_test_ProtocolCommands
[ 69%] Built target pod-agent_test_ProtocolCommands
[ 71%] Linking CXX executable pod-agent_test_ProofStatusFile
[ 73%] Linking CXX shared library libpod_sys_files.so
[ 73%] Built target pod-agent_test_ProofStatusFile
[ 73%] Built target pod_sys_files
Scanning dependencies of target pod-info
Scanning dependencies of target pod-ssh
Scanning dependencies of target pod-remote
[ 75%] Building CXX object app/pod-info/CMakeFiles/pod-info.dir/src/main.cpp.o
[ 78%] Building CXX object app/pod-info/CMakeFiles/pod-info.dir/src/Server.cpp.o
[ 78%] Building CXX object app/pod-info/CMakeFiles/pod-info.dir/src/SrvInfo.cpp.o
[ 86%] Building CXX object app/pod-ssh/CMakeFiles/pod-ssh.dir/src/main.cpp.o
[ 88%] Building CXX object app/pod-remote/CMakeFiles/pod-remote.dir/src/main.cpp.o
[ 90%] Building CXX object app/pod-remote/CMakeFiles/pod-remote.dir/src/MessageParser.cpp.o
[ 90%] Building CXX object app/pod-ssh/CMakeFiles/pod-ssh.dir/src/config.cpp.o
[ 90%] Building CXX object app/pod-ssh/CMakeFiles/pod-ssh.dir/src/worker.cpp.o
[ 90%] Building CXX object app/pod-remote/CMakeFiles/pod-remote.dir/src/Utils.cpp.o
[ 92%] Linking CXX executable pod-user-defaults
[ 92%] Built target pod-user-defaults
[ 94%] Linking CXX executable pod-agent
[ 94%] Built target pod-agent
[ 96%] Linking CXX executable pod-info
[ 96%] Built target pod-info
[ 98%] Linking CXX executable pod-remote
[ 98%] Built target pod-remote
[100%] Linking CXX executable pod-ssh
[100%] Built target pod-ssh
[iglez@fanae128 build]$ make -j wn_bin
[ 44%] Built target SSHTunnel
[ 44%] Built target pod-user-defaults
[ 50%] Built target proof_status_file
[ 50%] Built target pod_protocol
[ 94%] Built target pod-agent
Scanning dependencies of target wn_bin
[100%] Generate WN binary package
[100%] Built target wn_bin
[iglez@fanae128 build]$ make -j install
[ 30%] Built target SSHTunnel
[ 34%] Built target pod-ssh_test_threadpool
[ 48%] Built target pipe_log_engine
[ 48%] Built target MiscCommon_test_MiscUtils
[ 48%] Built target pod_sys_files
[ 48%] Built target MiscCommon_test_FindCfgFile
[ 50%] Built target pod_protocol
[ 50%] Built target pod-user-defaults
[ 50%] Built target MiscCommon_test_Process
[ 50%] Built target proof_status_file
[ 50%] Built target MiscCommon_test_SysHelper
[ 50%] Built target pod-ssh_test_config
[ 76%] Built target pod-agent_test_Protocol
[ 76%] Built target pod-agent_test_ProofStatusFile
[ 76%] Built target pod-agent_test_ProtocolCommands
[ 88%] Built target pod-ssh
[ 92%] Built target pod-info
[ 92%] Built target pod-remote
[100%] Built target pod-agent
Install the project...
-- Install configuration: "Release"
-- Installing: /nfs/fanae/PoD_releases/PoD-master.slc6/./LICENSE
-- Installing: /nfs/fanae/PoD_releases/PoD-master.slc6/./ReleaseNotes
-- Installing: /nfs/fanae/PoD_releases/PoD-master.slc6/etc/xpd.cf.in
-- Installing: /nfs/fanae/PoD_releases/PoD-master.slc6/./PoD_env.sh
-- Installing: /nfs/fanae/PoD_releases/PoD-master.slc6/etc/Job.pbs.in
-- Installing: /nfs/fanae/PoD_releases/PoD-master.slc6/etc/Job.lsf.in
-- Installing: /nfs/fanae/PoD_releases/PoD-master.slc6/etc/Job.ge.in
-- Installing: /nfs/fanae/PoD_releases/PoD-master.slc6/etc/Job.condor
-- Installing: /nfs/fanae/PoD_releases/PoD-master.slc6/etc/Job.condor.option
-- Installing: /nfs/fanae/PoD_releases/PoD-master.slc6/etc/Job.slurm
-- Installing: /nfs/fanae/PoD_releases/PoD-master.slc6/etc/PoDWorker.sh.in
-- Installing: /nfs/fanae/PoD_releases/PoD-master.slc6/bin/pod-server
-- Installing: /nfs/fanae/PoD_releases/PoD-master.slc6/bin/pod-check-update
-- Installing: /nfs/fanae/PoD_releases/PoD-master.slc6/bin/pod-prep-worker
-- Installing: /nfs/fanae/PoD_releases/PoD-master.slc6/bin/pod-submit
-- Installing: /nfs/fanae/PoD_releases/PoD-master.slc6/bin/private/pod-ssh-submit-worker
-- Installing: /nfs/fanae/PoD_releases/PoD-master.slc6/bin/private/pod-ssh-clean-worker
-- Installing: /nfs/fanae/PoD_releases/PoD-master.slc6/bin/private/pod-ssh-status-worker
-- Installing: /nfs/fanae/PoD_releases/PoD-master.slc6/bin/private/pod-ssh-exec-worker
-- Installing: /nfs/fanae/PoD_releases/PoD-master.slc6/bin/private/pod-ssh-keygen
-- Installing: /nfs/fanae/PoD_releases/PoD-master.slc6/bin/private/pod-remote-srv-info
-- Installing: /nfs/fanae/PoD_releases/PoD-master.slc6/bin/private/pod-addpayload
-- Installing: /nfs/fanae/PoD_releases/PoD-master.slc6/etc/gLite.jdl
-- Installing: /nfs/fanae/PoD_releases/PoD-master.slc6/etc/version
-- Installing: /nfs/fanae/PoD_releases/PoD-master.slc6/plugins/cli/pod-lsf-submit
-- Installing: /nfs/fanae/PoD_releases/PoD-master.slc6/plugins/cli/pod-pbs-submit
-- Installing: /nfs/fanae/PoD_releases/PoD-master.slc6/plugins/cli/pod-ge-submit
-- Installing: /nfs/fanae/PoD_releases/PoD-master.slc6/plugins/cli/pod-condor-submit
-- Installing: /nfs/fanae/PoD_releases/PoD-master.slc6/plugins/cli/pod-loadleveler-submit
-- Installing: /nfs/fanae/PoD_releases/PoD-master.slc6/plugins/cli/pod-glite-submit
-- Installing: /nfs/fanae/PoD_releases/PoD-master.slc6/plugins/cli/pod-panda-submit
-- Installing: /nfs/fanae/PoD_releases/PoD-master.slc6/plugins/cli/pod-slurm-submit
-- Installing: /nfs/fanae/PoD_releases/PoD-master.slc6/tests/run_test.sh
-- Installing: /nfs/fanae/PoD_releases/PoD-master.slc6/lib/libpipe_log_engine.so
-- Installing: /nfs/fanae/PoD_releases/PoD-master.slc6/lib/libpod_protocol.so
-- Installing: /nfs/fanae/PoD_releases/PoD-master.slc6/lib/libproof_status_file.so
-- Installing: /nfs/fanae/PoD_releases/PoD-master.slc6/lib/libSSHTunnel.so
-- Installing: /nfs/fanae/PoD_releases/PoD-master.slc6/bin/private/ssh-tunnel
-- Installing: /nfs/fanae/PoD_releases/PoD-master.slc6/lib/libpod_sys_files.so
-- Installing: /nfs/fanae/PoD_releases/PoD-master.slc6/tests/MiscCommon_test_MiscUtils
-- Installing: /nfs/fanae/PoD_releases/PoD-master.slc6/tests/MiscCommon_test_Process
-- Installing: /nfs/fanae/PoD_releases/PoD-master.slc6/tests/MiscCommon_test_SysHelper
-- Installing: /nfs/fanae/PoD_releases/PoD-master.slc6/tests/MiscCommon_test_FindCfgFile
-- Installing: /nfs/fanae/PoD_releases/PoD-master.slc6/bin/pod-agent
-- Set runtime path of "/nfs/fanae/PoD_releases/PoD-master.slc6/bin/pod-agent" to ""
-- Installing: /nfs/fanae/PoD_releases/PoD-master.slc6/tests/pod-agent_test_ProtocolCommands
-- Set runtime path of "/nfs/fanae/PoD_releases/PoD-master.slc6/tests/pod-agent_test_ProtocolCommands" to ""
-- Installing: /nfs/fanae/PoD_releases/PoD-master.slc6/tests/pod-agent_test_Protocol
-- Set runtime path of "/nfs/fanae/PoD_releases/PoD-master.slc6/tests/pod-agent_test_Protocol" to ""
-- Installing: /nfs/fanae/PoD_releases/PoD-master.slc6/tests/pod-agent_test_ProofStatusFile
-- Set runtime path of "/nfs/fanae/PoD_releases/PoD-master.slc6/tests/pod-agent_test_ProofStatusFile" to ""
-- Installing: /nfs/fanae/PoD_releases/PoD-master.slc6/tests/xpd.cf
-- Installing: /nfs/fanae/PoD_releases/PoD-master.slc6/bin/pod-info
-- Set runtime path of "/nfs/fanae/PoD_releases/PoD-master.slc6/bin/pod-info" to ""
-- Installing: /nfs/fanae/PoD_releases/PoD-master.slc6/bin/pod-remote
-- Set runtime path of "/nfs/fanae/PoD_releases/PoD-master.slc6/bin/pod-remote" to ""
-- Installing: /nfs/fanae/PoD_releases/PoD-master.slc6/bin/pod-user-defaults
-- Installing: /nfs/fanae/PoD_releases/PoD-master.slc6/bin/pod-ssh
-- Set runtime path of "/nfs/fanae/PoD_releases/PoD-master.slc6/bin/pod-ssh" to ""
-- Installing: /nfs/fanae/PoD_releases/PoD-master.slc6/tests/pod-ssh_test_config
-- Installing: /nfs/fanae/PoD_releases/PoD-master.slc6/tests/pod-ssh_test_threadpool
[iglez@fanae128 build]$ source ../../PoD-master.slc6/PoD_env.sh 
[iglez@fanae128 build]$ pod-server start
WNs pre-compiled binaries are missing.
Downloading WNs pre-compiled binaries...
Error: Can't download pre-compiled binaries for WNs.
Please check availability of  http://pod.gsi.de/releases/add/3.17.2.g2ebb/pod-wrk-bin-3.17.2.g2ebb-Linux-x86.tar.gz
[iglez@fanae128 build]$ pod-server getbins
WNs pre-compiled binaries are missing.
Downloading WNs pre-compiled binaries...
Error: Can't download pre-compiled binaries for WNs.
Please check availability of  http://pod.gsi.de/releases/add/3.17.2.g2ebb/pod-wrk-bin-3.17.2.g2ebb-Linux-x86.tar.gz

I am surely missing something here. Cheers, Isidro

AnarManafov commented 7 years ago

Unfortunately I don't have the build environment for PoD anymore and can't build wn packages. I can recover the env. It shouldn't be a problem, but it will take some time. So, let's try to resolve it online, it should be faster.

Can you please try to build and install everything only for you (for your local user)? Like let's try it for one user only. If this will work, we will go further and will find out how to make a shared installation.

iglezh commented 7 years ago

Hi Anar, Sorry for the late answer (easter holidays). It is a pitty you don't have a build environment. I tried to follow the instructions for a user installation.

mkdir build
cd build/
cmake -C ../BuildSetup.cmake ..
make -j
make -j wn_bin
make -j install

And then I tried to start the server...

$ source ~/PoD/3.17.2.g2ebb/PoD_env.sh 
$ pod-server start
WNs pre-compiled binaries are missing.
Downloading WNs pre-compiled binaries...
Error: Can't download pre-compiled binaries for WNs.
Please check availability of  http://pod.gsi.de/releases/add/3.17.2.g2ebb/pod-wrk-bin-3.17.2.g2ebb-Linux-x86.tar.gz

with no success... same error as above! If I list the content of the build directory I can see the pod-wrk-bin-3.17.2.g2ebb-Linux-amd64.tar.gz, but I cannot find it in the installation path. So my guess is that make install is not moving it to the right place where it is then expected.

Let me know if there is you need some more details.

Addition Even if I manually copy the tar.gz to the right place $POD_INSTALLATION/bin/wn_bins, the pod-server script will look for the other architectures and try to download them if they are not found.

iglezh commented 7 years ago

Hi @AnarManafov , Thinking a bit about this, it is strange that the system requires the worker binaries for all architectures, isn't it? Specially if I have a homogeneous cluster (which might not be the case).

Isidro

AnarManafov commented 7 years ago

It is kinda a limitation of PoD. It always require wn bins for all supported platforms. Because I used to auto build packages for every release (even nightly).

It is easy to workaround. But first of all, can you check the content of $POD_LOCATION/bin/wn_bins ? Does it exist and what is inside?

iglezh commented 7 years ago

Hi @AnarManafov , I see... Concerning your question. The folder exits. But it is empty. Cheers, Isidro

AnarManafov commented 7 years ago

Ok, give me a moment to check something...

AnarManafov commented 7 years ago

Ok, I got it. I was confusing PoD with its successor DDS (http://dds.gsi.de). DDS is much smarter and manages such things automatically.

Anyway, can you please try the following. If it works, let me know and I will adjust scripts to do it automatically.

  1. manually copy the worker package from the build dir in $POD_LOCATION/bin/wn_bins
  2. in the $POD_LOCATION/bin/pod-server script comment out line 456
iglezh commented 7 years ago

Hi again, Just tested. Few things: 1) I have to correct myself. I would swear that the $POD_LOCATION/bin/wn_bins was automatically created. But it wasn't in my test now. So it may well be that I created it manually on some of my previous tests. 2) Following your instructions I got it working... 3) Even if it works, I get an error when I execute pod-submit

$ pod-submit -r pbs -n 10 -q proof
PoDWorker.sh
/nfs/fanae/PoD_releases/PoD-master-test/build
qsub: submit error (Invalid request MSG=cannot locate new job 3933[].gae011.ciencias.uniovi.es (0 - Success))
Error submitting job.

don't get confused by the error. The job is there and pod-info is able to connect to the workers.

$ qstat
Job ID                    Name             User            Time Use S Queue
------------------------- ---------------- --------------- -------- - -----
3933[].gae011.ciencias.uniovi pod              iglez                  0 R proof 
$ pod-info -n
10

4) Is DDS ready to be used? If so, how difficult is the migration?

Thanks!

Isidro

AnarManafov commented 7 years ago

DDS is in production state. It is used by ALICE (AlFa project). Unfortunately there is no integration with PROOF. It has been discussed multiple times, but no one volontiert to implement the support. My dev. estimation is about 1-2 man/weeks. It can be done even by a student. I proposed to ROOT team as well, but they seem not to be interested to keep PROOF alive. What can I do then :( DDS is very easy to use. But unlike PoD it doesn't know about tasks it starts, so PROOF needs to be adjusted to start using DDS. As for the users, it would be much easier to use DDS, than PoD. Much easier.

iglezh commented 7 years ago

I see... unfortunatelly I have no time in the near future to help here Anar. Sorry about that :( It is a pitty they are dropping support on PROOF.

Concerning PoD, now that the issues are identified, how to set it up globally for all the users in my lab?

AnarManafov commented 7 years ago

Reg. PROOF. Yes, that is very unfortunate that ROOT is slowly dropping PROOF. I think PROOF is very handy for end users. When you need to run quickly your analysis on several machines.

Reg. This issue in PoD. Just try to copy this (your PoD) installation as it is in a shred location. It should do the job.

Let me know if something comes up.