ceteri / exelixi

Exelixi is a distributed framework based on Apache Mesos, mostly implemented in Python using gevent for high-performance concurrency. It is intended to run cluster computing jobs (partitioned batch jobs, which include some messaging) in pure Python. By default, it runs genetic algorithms at scale.
Apache License 2.0
133 stars 23 forks source link

is HDFS a hard requirement to setup/run exelixi framework? #2

Open dbsiegel opened 10 years ago

dbsiegel commented 10 years ago

Is HDFS a hard requirement to setup/run exelixi?

Hadoop is not currently part of playa-mesos box image, so install.sh fails the hadoop fs commands.

playa-mesos team is thinking to add support for a single node hadoop instance configured with pseudo-distributed operation. Would that work?

ceteri commented 10 years ago

Great to see this issue on GH!

Yes, HDFS is required when running on Mesos. Py code gets distributed via HDFS onto the slave nodes. That's pretty standard for how we use Spark and other frameworks.

However, you could run in standalone mode w/o Mesos -- that's mentioned in the "Getting Started" section of the wiki.

Huh... playa-mesos may need to rethink, since HDFS is needed by many popular Mesos use cases. Also, pseudo-distributed mode Hadoop is generally a bad idea. I should have a discussion with Jeremy about that...

Meanwhile, awesome gravatar there :)

dbsiegel commented 10 years ago

Thanks :) I will run in standalone mode for now w/o Mesos, on macOSX. Perhaps I've missed something. I see an import error when launching the framework ImportError: cannot import name shutdown. (from gevent in service.py)

ceteri commented 10 years ago

Gevent should have shutdown as a standard part of the package.

Trying running just the Python prompt from command line, then type

from gevent import shutdown

Does it give the same error? If you've got a GitHub gist of the full error trace, that'd probably help too.

Thanks,

dbsiegel commented 10 years ago

same error. I would embed this gist but don't know how at the moment. https://gist.github.com/d3borah/d83eefec307076371e8d

ceteri commented 10 years ago

Dang. This may require customer support on-site.

One thing that may help is to try running under Anaconda, instead of the default Py 2.7.x that comes installed on Mac OSX:

https://store.continuum.io/cshop/anaconda/

It should be quick to install, and is easily reversed.

ceteri commented 10 years ago

Also, just checked my local set up:

pacos-mbp-3:c3nom ceteri$ pip freeze | grep gevent gevent==0.13.8 gevent-websocket==0.3.6 gevent-zeromq==0.2.2

So that's a very different version of gevent. Will check next time when running in AWS, but it was the same previously.

dbsiegel commented 10 years ago

Thanks will look into Anaconda for this. I am just getting into python so not invested in the default.

your version of gevent is close to what's available on playa-mesos box. however, playa-mesos box apparently does not have hat_trie at the moment.

vagrant@mesos:~/exelixi$ pip freeze Cython==0.19.2 Pillow==2.0.0 apt-xapian-index==0.45 argparse==1.2.1 chardet==2.0.1 distribute==0.6.34 gevent==0.13.7 greenlet==0.4.0 mesos-0.14.0==rc4-amd64 mesos-0.15.0==rc4-amd64 numpy==1.7.1 pandas==0.13.0 protobuf==2.4.1 psutil==0.6.1 python-apt==0.8.8ubuntu6 python-dateutil==2.2 python-debian==0.1.21-nmu2ubuntu1 pytz==2013.9 requests==1.1.0 scikit-learn==0.14.1 scipy==0.11.0 six==1.2.0 ssh-import-id==3.14 urllib3==1.5 wsgiref==0.1.2

ceteri commented 10 years ago

That makes sense, playa-mesos will have introduced some other versions/dependencies then.

Yes, the hat_trie package needs to be installed from GitHub, since there's no PyPi support for it yet. If you use the command in bin/local_install.sh, then:

sudo pip install git+https://github.com/kmike/hat-trie.git#egg=hat-trie

That requires Git to be installed first, too.

dbsiegel commented 10 years ago

:+1: :) :)