Closed jackycsie closed 4 years ago
What is your cluster configuration?
That's a lot of memory for one executor, it may be having trouble allocating workers with that much memory, or using all the memory on 1 very large executor.
Have you tried setting executor cores as well? I would usually set it to something like --executor-cores 4 --executor-memory 16G
. You want to design your executors so they fit evenly into the worker nodes on your cluster but don't have too many cores per executor.
An aside, you should be able to create a bam index as part of SortSamSpark now, we have support for generating it in parallel and merging the indexes.
@lbergelson Thank you~
When I set the reduce threshold that another cluster can use, gatk can use multiple clusters.
In addition, I set the reason "--create-output-bam-index False" because my bai file can't be merged, so I can only use this method.
Thanks jacky
I'm not totally clear from your response but I think you've resolved the problem?
If you're encountering a bug merging bai files could you open an issue describing that with your stack trace and any relevant information about the configuration you're running?
We run SortSamSpark using gatk4.1.4.0. We found that when running gatk, spark cluster only uses one worker, and other server wokers can't be used. What is the reason?
Thanks.
My command is: gatk SortSamSpark --input ERR194147.bam \ --output /mnt/jacky/ERR194147.sorted.bam \ --create-output-bam-index False --tmp-dir . \ -- --spark-runner SPARK \ --spark-master spark://172.16.96.98:7077 \ --executor-memory 720G