GELOG / adamcloud

Portable cloud infrastructure for a genomic transformation pipeline using Adam
2 stars 0 forks source link

Optimize orchestration scripts #10

Closed davidonlaptop closed 9 years ago

davidonlaptop commented 9 years ago

Create a parametrizeable script that can setup a complete genomic processing environment (snap, adam, avocado, spark, hdfs). The script should be the same regardless if it is executed on 1 machine or a cluster.

Parameters

An array of objects with the following properties:

Let's start with the bash scripts created by Sebastien Bonami, and update them with the new docker images.

Potential solution

Fig seems to be the best solution seems it is created by the Docker team, and it is planned to be integrated into docker project soon and renamed Docker Compose (see here, and here).

flangelier commented 9 years ago

DISCUSSION

Images we need:

davidonlaptop commented 9 years ago

Design Template

Service Design Template

This applies for Hadoop HDFS, Spark for now. Later on, we'll add MapReduce.

orchestrate <env> <service> <service-params>
orchestrate <env> hdfs <nn-host> <snn-host> <dn1-host> <dn2-host> ...
orchestrate <env> spark <spark-master-host> <worker1-host> <worker2-host> ...

# For localhost environment
# Uses config files (Hadoop): env/local/hadoop/hdfs-site.xml
# Uses config files (Spark): env/local/spark/spark-config-file.yml
orchestrate local hdfs localhost localhost localhost
orchestrate local spark localhost localhost

# For Mac mini cluster
# Uses config files (Hadoop): env/macmini/hadoop/hdfs-site.xml
# Uses config files (Spark): env/macmini/spark/spark-config-file.yml
orchestrate macmini hdfs mini1 mini1 mini2 mini3 mini4

Genomic Design Template

# Spawns a docker container named 'snap1', 'snap2', ... with the requested params
snap <snap-host> <snap-params>

# Spawns a docker container named 'adam1', 'adam2', ... with the requested params
adam <adam-host> <adam-params>
ou au besoin:
adam <adam-host> <spark-host> <adam-params>

# and so on...