broadinstitute / gatk

Official code repository for GATK versions 4 and up
https://software.broadinstitute.org/gatk
Other
1.68k stars 589 forks source link

GenomicsDBImport fails with 2500 samples: OSError: [Errno 7] Argument list too long: '/bin/bash' #6428

Closed amizeranschi closed 4 years ago

amizeranschi commented 4 years ago

Hello,

I'm running into issues with joint genotyping on 2500 (small) GVCF files. I'm running GATK4 from bcbio-nextgen on an SGE cluster and GenomicsDBImport runs into: OSError: [Errno 7] Argument list too long: '/bin/bash'.

I've checked bcbio's logs and it looks like GenomicsDBImport is being executed on all the 2500 samples simultaneously, with the option --batch-size 50, which doesn't seem to help in my case:

[2020-01-30T06:56Z] haswell-wn29.grid.pub.ro: unset JAVA_HOME && export PATH=/export/home/ncit/external/a.mizeranschi/bcbio_nextgen/anaconda/bin:"$PATH" && gatk --java-options '-Xms800m -Xmx11466m -XX:+UseSerialGC -Djava.io.tmpdir=/export/home/ncit/external/a.mizeranschi/automated-VC-test/testingVC/work/joint/gatk-haplotype-joint/testingVC/chr7/bcbiotx/tmp2bu4nmgf' GenomicsDBImport --reader-threads 4 --genomicsdb-workspace-path testingVC-chr7_0_141973873_genomicsdb -L chr7:1-141973873 --variant /export/home/ncit/external/a.mizeranschi/automated-VC-test/testingVC/work/precalled/HG00096-joint-gatk-haplotype-annotated-precalled.vcf.gz --variant /export/home/ncit/external/a.mizeranschi/automated-VC-test/testingVC/work/precalled/HG00097-joint-gatk-haplotype-annotated-precalled.vcf.gz --variant /export/home/ncit/external/a.mizeranschi/automated-VC-test/testingVC/work/precalled/HG00099-joint-gatk-haplotype-annotated-precalled.vcf.gz [...] --batch-size 50

Is there a way I can things running with the 2500 samples?

For more details, including the full stack trace, please see: https://github.com/bcbio/bcbio-nextgen/issues/3074.

ldgauthier commented 4 years ago

This should work if you try specifying the samples and paths with a sample map TSV file: column 1 as name, column 2 as path, no header.

amizeranschi commented 4 years ago

Thanks a lot for the suggested fix. This has now been implemented in bcbio-nextgen and it works perfectly.