flowgrad / scoop

Automatically exported from code.google.com/p/scoop
GNU Lesser General Public License v3.0
0 stars 0 forks source link

Problem with long hostname resolution #10

Open GoogleCodeExporter opened 8 years ago

GoogleCodeExporter commented 8 years ago
What steps will reproduce the problem?

1. Run a mutli-node scoop run using using full domain names in the --hosts line
e.g.,
python -m scoop.__main__ --backend ZMQ -vv --hosts node1.default.domain 
node2.default.domain -n 32 scoopCode.py 

What is the expected output? 

I would expect this command
python -m scoop.__main__ --backend ZMQ -vv --hosts node1.default.domain 
node2.default.domain -n 32 scoopCode.py

to do the same this as this command
python -m scoop.__main__ --backend ZMQ -vv --hosts node1 node2 -n 32 
scoopCode.py

What do you see instead?

using long host names I get the following error

ERROR:root:Error while launching SCOOP subprocesses:
ERROR:root:Traceback (most recent call last):
  File "/pkg/suse11/python/scoop/0.7.2/lib/python2.7/site-packages/scoop-0.7.2.dev-py2.7.egg/scoop/launcher.py", line 469, in main
    rootTaskExitCode = thisScoopApp.run()
  File "/pkg/suse11/python/scoop/0.7.2/lib/python2.7/site-packages/scoop-0.7.2.dev-py2.7.egg/scoop/launcher.py", line 258, in run
    backend=self.backend,
  File "/pkg/suse11/python/scoop/0.7.2/lib/python2.7/site-packages/scoop-0.7.2.dev-py2.7.egg/scoop/launch/brokerLaunch.py", line 148, in __init__
    "SSH process stderr:\n{stderr}".format(**locals()))
Exception: Could not successfully launch the remote broker.
Requested remote broker ports, received:

Port number decoding error:
need more than 1 value to unpack
SSH process stderr:
Connection to cl2n091.default.domain closed.

But it runs perfectly fine with only the sort host names

What version of the product are you using? 

Python 2.7.5
Scoop version 0.7.2

On what operating system?

SUSE Linux 11 

Please provide any additional information below.

I am actually try to run this on out SGI cluster (SGI customized SUSE11), it 
uses PBS Pro as the scheduler. If I submit a job with how the hosts line, scoop 
detects the hosts PBS has given the job correctly, but it provides the full 
hostnames. If I submit a multinode interactive job and manually provide the 
short names it works fine, but this is really not ideal as it should be able to 
go through the batch system properly. 

Original issue reported on code.google.com by davidwar...@gmail.com on 27 Aug 2014 at 10:22