broadinstitute / gatk

Official code repository for GATK versions 4 and up
https://software.broadinstitute.org/gatk
Other
1.68k stars 589 forks source link

Spark cluster only able to run one worker. #6233

Closed jackycsie closed 4 years ago

jackycsie commented 4 years ago

We run SortSamSpark using gatk4.1.4.0. We found that when running gatk, spark cluster only uses one worker, and other server wokers can't be used. What is the reason?

Thanks.

My command is: gatk SortSamSpark --input ERR194147.bam \ --output /mnt/jacky/ERR194147.sorted.bam \ --create-output-bam-index False --tmp-dir . \ -- --spark-runner SPARK \ --spark-master spark://172.16.96.98:7077 \ --executor-memory 720G

lbergelson commented 4 years ago

What is your cluster configuration?

That's a lot of memory for one executor, it may be having trouble allocating workers with that much memory, or using all the memory on 1 very large executor. Have you tried setting executor cores as well? I would usually set it to something like --executor-cores 4 --executor-memory 16G . You want to design your executors so they fit evenly into the worker nodes on your cluster but don't have too many cores per executor.

An aside, you should be able to create a bam index as part of SortSamSpark now, we have support for generating it in parallel and merging the indexes.

jackycsie commented 4 years ago

@lbergelson Thank you~

When I set the reduce threshold that another cluster can use, gatk can use multiple clusters.

In addition, I set the reason "--create-output-bam-index False" because my bai file can't be merged, so I can only use this method.

Thanks jacky

lbergelson commented 4 years ago

I'm not totally clear from your response but I think you've resolved the problem?

If you're encountering a bug merging bai files could you open an issue describing that with your stack trace and any relevant information about the configuration you're running?