Run the pipeline on a small / medium / large genome

GELOG / adamcloud

Portable cloud infrastructure for a genomic transformation pipeline using Adam

2 stars 0 forks source link

Run the pipeline on a small / medium / large genome #11

Open davidonlaptop opened 9 years ago

davidonlaptop commented 9 years ago

With updated versions of the BDGenomics pipeline (snap, adam, avocado), use a small / medium / large genome to validate the new images and the orchestration scripts. We'll compare these results with the results from S. Bonami and the BDGenomics papers.

References

TODO: Find the data used in the Snap / Adam / Avocado papers.

flangelier commented 9 years ago

Avocado is always failing at the moment...

davidonlaptop commented 9 years ago

what's the error message?

davidonlaptop commented 9 years ago

(let's leave comments here, for future tracking)

François Langelier 2:35 PM (22 minutes ago) to David 2015-03-22 17:29:25 ERROR Executor:96 - Exception in task 0.0 in stage 1.0 (TID 16) java.lang.OutOfMemoryError: GC overhead limit exceeded

davidonlaptop commented 9 years ago

Potential solution: https://plumbr.eu/outofmemoryerror/gc-overhead-limit-exceeded

codingtony commented 9 years ago

If you can run it with jvisualvm attached you will have more idea of what is using the memory and what is the usage of the heap in general.

What JVM parameters you are currently using ?

-tony

On Mon, Mar 23, 2015 at 3:13 PM, David Lauzon notifications@github.com wrote:

Potential solution: https://plumbr.eu/outofmemoryerror/gc-overhead-limit-exceeded

— Reply to this email directly or view it on GitHub https://github.com/GELOG/adamcloud/issues/11#issuecomment-85154098.

davidonlaptop commented 9 years ago

Currently, it is the default settings. I think there is no enough memory allocated to the spark workers.

Francois will is describing the steps-by-steps to reproduce the problem.

sebastienbonami commented 9 years ago

@flangelier I think I encountered the same problem as you with Avocado! Change the values of the two lines below in the file bin/avocado-submit. 4g is probably more than what you have available in memory and the JVM can't start.

--conf spark.executor.memory=${AVOCADO_EXECUTOR_MEMORY:-4g} \
--driver-memory ${AVOCADO_DRIVER_MEMORY:-4g} \

See: https://github.com/bigdatagenomics/avocado/blob/master/bin/avocado-submit#L56-58