Open jin0008 opened 1 month ago
Can you provide your logs that shows the error message?
There are no error messages. The process was interrupted without any error messages. I attached the screenshot. I attached chr14 variant calling (completed) and chr14 variant calling (interrupted). In the system monitor, when I am using GATK 4.6.0.0., they are eating up memory continuously. When they are reaching up to 512Gb, the process was interrupted. I tried to run this process on only 2-3 chromosomes, and I found that the process was completed on chr 14, and the process was interrupted on the rest of two chromosomes (interval -L). So I rolled back to GATK 4.5.0.0, the process was normal. I can do GenotypeGVCFs command entire chromosome simultaneously.
My machine has 512Gb memory and 64 cores (5995wx AMD threadripper) dell 7865 workstation. Thanks Jinu Han
On Fri, Jul 19, 2024 at 12:08 AM Gökalp Çelik @.***> wrote:
Can you provide your logs that shows the error message?
— Reply to this email directly, view it on GitHub https://github.com/broadinstitute/gatk/issues/8918#issuecomment-2236819113, or unsubscribe https://github.com/notifications/unsubscribe-auth/AG7IXWWGPB73BXPN4Z5E4VTZM7LAFAVCNFSM6AAAAABLBRETECVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDEMZWHAYTSMJRGM . You are receiving this because you authored the thread.Message ID: @.***>
Can you provide more details on what operating system you are using and other related information such as java version etc?
Even if the process gets interrupted by the system there must be a java segfault message at some point thrown by the process. Did you observe any files with names ERR around the output file?
Hi, The operating system is ubuntu 20.04. java version is openjdk "17.0.11". If the process of GATK best practice has been interrupted, I could see the error messages always. But, in this time, the process was interrupted without giving any messages. This is quite weird. I checked this several other chromosomes. My callset has about 430 samples. I could run GenotypeGVCFs in GATK 4.5.0.0 version without any problem. But, in GATK 4.6.0.0, the process was successful in 3-4 chromosomes (which is smaller one I think). The process has been interrupted in incomplete stages. I could not find any ERR files in the folder. Thanks Jinu Han
On Fri, Jul 19, 2024 at 7:01 PM Gökalp Çelik @.***> wrote:
Can you provide more details on what operating system you are using and other related information such as java version etc?
Even if the process gets interrupted by the system there must be a java segfault message at some point thrown by the process. Did you observe any files with names ERR around the output file?
— Reply to this email directly, view it on GitHub https://github.com/broadinstitute/gatk/issues/8918#issuecomment-2238814358, or unsubscribe https://github.com/notifications/unsubscribe-auth/AG7IXWSQYT56QW4Q4YCZUPTZNDPXPAVCNFSM6AAAAABLBRETECVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDEMZYHAYTIMZVHA . You are receiving this because you authored the thread.Message ID: @.***>
Can you tell us how much is your heap size for this task? (-Xmx? -Xms?)
i have a similar issue. Weirdly -Xmx does not help.
@icemduru Can you provide more details on your issue? How many samples do you have? How did you combine them and what are your command lines for this process? Can you provide more details on the system that you are running these commands on?
GenotypeGVCFs is not known to have memory leak issues. Our tests indicated that it only needs around 4~6GBs of total memory to genotype 120 whole genome samples (Per contig).
@icemduru Can you provide more details on your issue? How many samples do you have? How did you combine them and what are your command lines for this process? Can you provide more details on the system that you are running these commands on?
GenotypeGVCFs is not known to have memory leak issues. Our tests indicated that it only needs around 4~6GBs of total memory to genotype 120 whole genome samples (Per contig).
Thanks for reply. I have 370 samples. I have run HaplotypeCaller for each of them. Then run GenomicsDBImport for each of the chromosome (it is a plant genome, about 420 mb in total genome size). Then tried to run GenotypeGVCFs for each chromosome. I attached the log file for chr1. slurm-22616776.out_text.txt
Hi @icemduru Looks like your slurm workload manager was configured to have a limit of 48GBs of maximum process memory size per execution. Your java instance is set with -Xmx45G which will cover most of this limit and leaves only a handful of memory space for the native GenomicsDB library. Native libraries work above the heapsize so it is better for you to set your -Xmx to a more sensible size of 8~12GB and leave rest of the memory space to the native library to use.
Keep in mind that this memory limit on slurm could be set per user not per task therefore you may need to run a single contig at a time or maybe 2 of them simultaneously. Otherwise slurm may interefere with all the tasks and cancel all your jobs.
One final reminder. We strongly recommend users to set the temporary directory to somewhere else other than /tmp. Slurm workload manager interferes with this preference and sometimes results in premature termination of the gatk processes due to deletion of extracted native library and accessory files.
I hope this helps.
Hi @icemduru Looks like your slurm workload manager was configured to have a limit of 48GBs of maximum process memory size per execution. Your java instance is set with -Xmx45G which will cover most of this limit and leaves only a handful of memory space for the native GenomicsDB library. Native libraries work above the heapsize so it is better for you to set your -Xmx to a more sensible size of 8~12GB and leave rest of the memory space to the native library to use.
Keep in mind that this memory limit on slurm could be set per user not per task therefore you may need to run a single contig at a time or maybe 2 of them simultaneously. Otherwise slurm may interefere with all the tasks and cancel all your jobs.
One final reminder. We strongly recommend users to set th slurm-22680938.out_text.txt e temporary directory to somewhere else other than /tmp. Slurm workload manager interferes with this preference and sometimes results in premature termination of the gatk processes due to deletion of extracted native library and accessory files.
I hope this helps.
Thank you for your help, but unfortunately it didn't resolve the issue. I've already tried allocating 10GB of memory using the -Xmx10g flag and redirecting the temporary directory away from /tmp. However, GATK is still attempting to consume more than 48GB of RAM, resulting in the termination of my run. slurm-22680938.out_text.txt
Hi again.
Did you add the --consolidate true
parameter to GenomicsDBImport during importing stage? It is a step which collapses each layer of import into a single layer which prevents tools to open too many files at once but it may also take sometime at the end of the importing stage. It also reduces the amount of book keeping to be done by the genotyper.
Hi again. Did you add the
--consolidate true
parameter to GenomicsDBImport during importing stage? It is a step which collapses each layer of import into a single layer which prevents tools to open too many files at once but it may also take sometime at the end of the importing stage. It also reduces the amount of book keeping to be done by the genotyper.
Hi,
Thanks for the suggestion. I have used the --consolidate true
parameter to GenomicsDBImport during importing stage. However, it did not help. But I solved my problem using large memory machines. For future reference, required memory was 95.11 GB for 370 samples dataset using -Xmx8G and --disable-bam-index-caching true.
Bug Report
Affected tool(s) or class(es)
GenotypeGVCFs
Affected version(s)
4.6.0.0
Description
When I was doing GenotypeGVCFs from GenomicsDB of 420 samples, the process interrupted due to significant memory issues. This process was eating up memory continuously. In 4.5.0.0, I did same process, and I confirmed it works fine.