galaxyproject / cloudman

Easily create and manage compute clusters on any Cloud.
https://galaxyproject.org/cloudman/
37 stars 23 forks source link

How to provide persistent_data.yaml and snaps.yaml and uncaught URL error #29

Closed MatthewRalston closed 9 years ago

MatthewRalston commented 9 years ago

I am having challenges getting cloudman to launch successfully. The cm_boot.py script runs fine with cloudlaunch after some configurations and downloads the tar file "cm.tar.gz" from my bucket. This tar file was pulled from the cloudman bucket on 10/29/15. The cm_boot.py script then extracts and triggers the run.sh script of Cloudman.

2015-10-30 11:29:22,101 DEBUG  cm_boot:25  - Successfully ran '/bin/bash -l -c 'VIRTUALENVWRAPPER_LOG_DIR=/tmp/; HOME=/home/galaxy; . /home/galaxy/.venvburrito/startup.sh; workon CM; cd /mnt/cm; pip install -r /mnt/cm/requirements.txt; sh run.sh --daemon --log-file=/var/log/cloudman/cloudman.log''

I noticed that Cloudlaunch hangs when something goes wrong during Cloudman startup: Cloudlaunch #46

I get some 404's in the middle of run.sh because I do not also have snaps.yaml or persistent_data.yaml in my bucket. I have found a few mentions of "persistent_data.yaml" in the documentation, but no examples or descriptions of what I need to add there. Also, snaps.yaml is not described at all. What do I need to provided here and do I need to supply these snapshots?

The uncaught URL error occurs next during the parsing of "s3.amazonaws.com:None". My s3 and ec2 ports are both null in the userData.yaml file. The error seems to start in the method "get_file_from_public_bucket" and my default bucket is not public. The same error seems to occur when trying to fetch snaps.yaml but doesn't occur when fetching persistent_data.yaml.

/home/galaxy/.virtualenvs/CM/lib/python2.6/site-packages/paste/deploy/loadwsgi.py:22: DeprecationWarning: Parameters to load are deprecated.  Call .resolve and .require separately.
  return pkg_resources.EntryPoint.parse("x=" + s).load(False)
Python version:  (2, 6)
Image configuration suports: {'apps': ['cloudman', 'galaxy']}
2015-10-30 11:29:26,124 DEBUG            app:74   Initializing app
2015-10-30 11:29:26,124 DEBUG            ec2:124  Gathering instance zone, attempt 0
2015-10-30 11:29:26,129 DEBUG            ec2:130  Instance zone is 'us-east-1a'
2015-10-30 11:29:26,129 DEBUG            ec2:48   Gathering instance ami, attempt 0
2015-10-30 11:29:26,131 DEBUG            app:77   Running on 'ec2' type of cloud in zone 'us-east-1a' using image 'ami-1234567'.
2015-10-30 11:29:26,131 DEBUG            app:95   Getting pd.yaml
2015-10-30 11:29:26,131 DEBUG            ec2:387  No S3 Connection, creating a new one.
2015-10-30 11:29:26,133 DEBUG            ec2:391  Got boto S3 connection.
2015-10-30 11:29:26,495 DEBUG           misc:578  Failed to get file 'persistent_data.yaml' from bucket 'my_company's_bucket_name': S3ResponseError: 404 Not Found
<?xml version="1.0" encoding="UTF-8"?>
<Error><Code>NoSuchKey</Code><Message>The specified key does not exist.</Message><Key>persistent_data.yaml</Key><RequestId>foobar</RequestId><HostId>bobloblaw'slawblog</HostId></Error>
2015-10-30 11:29:26,495 DEBUG            app:102  Setting deployment_version to 2
2015-10-30 11:29:26,495 INFO             app:109  Master starting
2015-10-30 11:29:26,495 DEBUG         master:64   Initializing console manager - cluster start time: 2015-10-30 15:29:26.495641
2015-10-30 11:29:26,496 DEBUG           comm:42   AMQP Connection Failure:  [Errno 111] Connection refused
2015-10-30 11:29:26,496 DEBUG         master:857  Trying to discover any worker instances associated with this cluster...
2015-10-30 11:29:26,496 DEBUG            ec2:366  Establishing boto EC2 connection
2015-10-30 11:29:26,922 DEBUG            ec2:354  Got region as 'RegionInfo:us-east-1'
2015-10-30 11:29:28,012 DEBUG            ec2:375  Got boto EC2 connection for region 'us-east-1'
2015-10-30 11:29:28,305 DEBUG           misc:578  Failed to get file 'snaps.yaml' from bucket 'my_company's_bucket_name': S3ResponseError: 404 Not Found
<?xml version="1.0" encoding="UTF-8"?>
<Error><Code>NoSuchKey</Code><Message>The specified key does not exist.</Message><Key>snaps.yaml</Key><RequestId>barbaz</RequestId><HostId>bobloblaw'slawblog</HostId></Error>
Traceback (most recent call last):
  File "./scripts/paster.py", line 24, in <module>
    command.run()
  File "/home/galaxy/.virtualenvs/CM/lib/python2.6/site-packages/paste/script/command.py", line 104, in run
    invoke(command, command_name, options, args[1:])
  File "/home/galaxy/.virtualenvs/CM/lib/python2.6/site-packages/paste/script/command.py", line 143, in invoke
    exit_code = runner.run(args)
  File "/home/galaxy/.virtualenvs/CM/lib/python2.6/site-packages/paste/script/command.py", line 238, in run
    result = self.command()
  File "/home/galaxy/.virtualenvs/CM/lib/python2.6/site-packages/paste/script/serve.py", line 284, in command
    relative_to=base, global_conf=vars)
  File "/home/galaxy/.virtualenvs/CM/lib/python2.6/site-packages/paste/script/serve.py", line 321, in loadapp
    **kw)
  File "/home/galaxy/.virtualenvs/CM/lib/python2.6/site-packages/paste/deploy/loadwsgi.py", line 247, in loadapp
    return loadobj(APP, uri, name=name, **kw)
  File "/home/galaxy/.virtualenvs/CM/lib/python2.6/site-packages/paste/deploy/loadwsgi.py", line 272, in loadobj
    return context.create()
  File "/home/galaxy/.virtualenvs/CM/lib/python2.6/site-packages/paste/deploy/loadwsgi.py", line 710, in create
    return self.object_type.invoke(self)
  File "/home/galaxy/.virtualenvs/CM/lib/python2.6/site-packages/paste/deploy/loadwsgi.py", line 229, in invoke
    filtered = context.next_context.create()
  File "/home/galaxy/.virtualenvs/CM/lib/python2.6/site-packages/paste/deploy/loadwsgi.py", line 710, in create
    return self.object_type.invoke(self)
  File "/home/galaxy/.virtualenvs/CM/lib/python2.6/site-packages/paste/deploy/loadwsgi.py", line 146, in invoke
    return fix_call(context.object, context.global_conf, **context.local_conf)
  File "/home/galaxy/.virtualenvs/CM/lib/python2.6/site-packages/paste/deploy/util.py", line 56, in fix_call
    val = callable(*args, **kw)
  File "/mnt/cm/cm/buildapp.py", line 64, in app_factory
    app.startup()
  File "/mnt/cm/cm/app.py", line 111, in startup
    self.manager = master.ConsoleManager(self)
  File "/mnt/cm/cm/util/master.py", line 87, in __init__
    self.snaps = self._load_snapshot_data()
  File "/mnt/cm/cm/util/decorators.py", line 41, in df
    return fn(*args, **kwargs)
  File "/mnt/cm/cm/util/master.py", line 53, in newFunction
    return f(*args, **kw)
  File "/mnt/cm/cm/util/master.py", line 223, in _load_snapshot_data
    elif misc.get_file_from_public_bucket(self.app.ud, self.app.ud['bucket_default'], 'snaps.yaml', snaps_file):
  File "/mnt/cm/cm/util/misc.py", line 730, in get_file_from_public_bucket
    r = requests.get(url)
  File "/home/galaxy/.virtualenvs/CM/lib/python2.6/site-packages/requests/api.py", line 69, in get
    return request('get', url, params=params, **kwargs)
  File "/home/galaxy/.virtualenvs/CM/lib/python2.6/site-packages/requests/api.py", line 50, in request
    response = session.request(method=method, url=url, **kwargs)
  File "/home/galaxy/.virtualenvs/CM/lib/python2.6/site-packages/requests/sessions.py", line 454, in request
    prep = self.prepare_request(req)
  File "/home/galaxy/.virtualenvs/CM/lib/python2.6/site-packages/requests/sessions.py", line 388, in prepare_request
    hooks=merge_hooks(request.hooks, self.hooks),
  File "/home/galaxy/.virtualenvs/CM/lib/python2.6/site-packages/requests/models.py", line 293, in prepare
    self.prepare_url(url, params)
  File "/home/galaxy/.virtualenvs/CM/lib/python2.6/site-packages/requests/models.py", line 347, in prepare_url
    raise InvalidURL(*e.args)
requests.exceptions.InvalidURL: Failed to parse: s3.amazonaws.com:None
Removing PID file cm_webapp.pid
nuwang commented 9 years ago

I was looking into the cloudman code, and it looks like the cloudman version you are using is a bit old. The get_file_from_public_bucket had been replaced by get_file_from_public_location on April 5th. Can you check cm/util/misc.py and check whether get_file_from_public_bucket is doing config.get or ud.get for properties like the port? I think it must be config.get() to work correctly with the cloudlaunch settings you've mentioned here: galaxyproject/cloudlaunch#46.

It's still pretty weird that it's concatenating a null value though, even the older code defaults to port 443 if None.

MatthewRalston commented 9 years ago

@nuwang you're right. The cm.tar.gz tarball that I acquired from the "cloudman" S3 bucket uses an older version of cloudman and the file misc.py uses get_file_from_public_bucket and uses ud.get instead of config.get. I will update my tarball and boot script, relaunch, and let you know what happens.

MatthewRalston commented 9 years ago

Updating my python version (CentOS's devel copy is 2.6.6 X( ) and updating the Cloudman source solved this problem.