Closed gianfilippo closed 7 months ago
UPDATE: I issued the same "gatk MarkDuplicates" command as in the log file, using an interactive node with only 32G of memory and it completed. May be the problem is with DROP/snakemake default settings for memory management, but I am not sure how to change that. Any suggestions ?
Hi, I think there was an issue with the maskMultiVCF because a path couldn't be accessed. It is working now. It could have been that. 180 Gb for 10 samples should be more than enough. You could add specific resource allocations to the headers of the scripts.
Hi, just completed a rerun and it worked. Thanks!
Hi,
I am trying to run the MAE and rnaVariantCalling and I am getting a OOM error in markDuplicates (see below).
I am submitting this as a slurm job and I allocated 10 cores and 180Gb for the last run. I do not recall (I may be wrong) having to allocate more memory when running a GATK based pipeline for RNAseq data. Should I just allocate more memory or use the config file to manage it ?
Thanks
Error in rule markDuplicates: jobid: 247 input: DROP/Analysis_60M_60F_ExternalSamples/processed_data/rnaVariantCalling/out/bam/661T/661T_Aligned.sortedByCoord.out.bam, DROP/Analysis_60M_60F_ExternalSamples/processed_data/rnaVariantCalling/out/bam/661T/661T_Aligned.sortedByCoord.out.bam.bai output: DROP/Analysis_60M_60F_ExternalSamples/processed_data/rnaVariantCalling/out/bam/661T/661T_Aligned.sortedByCoord.dupMarked.out.bam, DROP/Analysis_60M_60F_ExternalSamples/processed_data/rnaVariantCalling/out/bam/661T/661T_Aligned.sortedByCoord.dupMarked.out.bai log: DROP/Analysis_60M_60F_ExternalSamples/processed_data/rnaVariantCalling/logs/markDuplicates/661T.log (check log file(s) for error details) shell:
DROP/Analysis_60M_60F_ExternalSamples/processed_data/rnaVariantCalling/out/picard-tools-marked-dup-metrics.txt --CREATE_INDEX true --TMP_DIR "/tmp" --VALIDATION_STRINGENCY SILENT 2> DROP/Analysis_60M_60F_ExternalSamples/processed_data/rnaVariantCalling/logs/markDuplicates/661T.log
Removing output files of failed job markDuplicates since they might be corrupted: DROP/Analysis_60M_60F_ExternalSamples/processed_data/rnaVariantCalling/out/bam/661T/661T_Aligned.sortedByCoord.dupMarked.out.bam, DROP/Analysis_60M_60F_ExternalSamples/processed_data/rnaVariantCalling/out/bam/661T/661T_Aligned.sortedByCoord.dupMarked.out.bai
Below is the end of the sample specific log file DROP/Analysis_60M_60F_ExternalSamples/processed_data/rnaVariantCalling/logs/markDuplicates/661T.log
INFO 2024-02-05 23:22:39 MarkDuplicates Sorting list of duplicate records. INFO 2024-02-05 23:22:42 MarkDuplicates After generateDuplicateIndexes freeMemory: 17094205248; totalMemory: 25199378432; maxMemory: 32178700288 INFO 2024-02-05 23:22:42 MarkDuplicates Marking 29761681 records as duplicates. INFO 2024-02-05 23:22:42 MarkDuplicates Found 3318 optical duplicate clusters. INFO 2024-02-05 23:22:42 MarkDuplicates Reads are assumed to be ordered by: coordinate INFO 2024-02-05 23:23:26 MarkDuplicates Written 10,000,000 records. Elapsed time: 00:00:44s. Time for last 10,000,000: 44s. Last read position: chr4:73,408,762 INFO 2024-02-05 23:24:10 MarkDuplicates Written 20,000,000 records. Elapsed time: 00:01:28s. Time for last 10,000,000: 44s. Last read position: chr8:108,203,053 INFO 2024-02-05 23:25:00 MarkDuplicates Written 30,000,000 records. Elapsed time: 00:02:18s. Time for last 10,000,000: 49s. Last read position: chr14:94,378,547 INFO 2024-02-05 23:25:44 MarkDuplicates Written 40,000,000 records. Elapsed time: 00:03:02s. Time for last 10,000,000: 43s. Last read position: chr19:58,355,146 INFO 2024-02-05 23:26:18 MarkDuplicates Written 50,000,000 records. Elapsed time: 00:03:36s. Time for last 10,000,000: 33s. Last read position: chrM:8,968 INFO 2024-02-05 23:26:40 MarkDuplicates Writing complete. Closing input iterator. INFO 2024-02-05 23:26:40 MarkDuplicates Duplicate Index cleanup. INFO 2024-02-05 23:26:40 MarkDuplicates Getting Memory Stats. INFO 2024-02-05 23:26:40 MarkDuplicates Before output close freeMemory: 288414536; totalMemory: 335544320; maxMemory: 32178700288 INFO 2024-02-05 23:26:40 MarkDuplicates Closed outputs. Getting more Memory Stats. INFO 2024-02-05 23:26:40 MarkDuplicates After output close freeMemory: 188927600; totalMemory: 234881024; maxMemory: 32178700288 [Mon Feb 05 23:26:40 EST 2024] picard.sam.markduplicates.MarkDuplicates done. Elapsed time: 9.16 minutes. Runtime.totalMemory()=234881024 Using GATK jar $HOME/.conda/envs/drop_env_133/share/gatk4-4.4.0.0-0/gatk-package-4.4.0.0-local.jar Running: java -Dsamjdk.use_async_io_read_samtools=false -Dsamjdk.use_async_io_write_samtools=true -Dsamjdk.use_async_io_write_tribble=false -Dsamjdk.compression_level=2 -jar $HOME/.conda/envs/drop_env_133/share/gatk4-4.4.0.0-0/gatk-package-4.4.0.0-local.jar MarkDuplicates -I DROP/Analysis_60M_60F_ExternalSamples/processed_data/rnaVariantCalling/out/bam/661T/661T_Aligned.sortedByCoord.out.bam -O DROP/Analysis_60M_60F_ExternalSamples/processed_data/rnaVariantCalling/out/bam/661T/661T_Aligned.sortedByCoord.dupMarked.out.bam -M DROP/Analysis_60M_60F_ExternalSamples/processed_data/rnaVariantCalling/out/picard-tools-marked-dup-metrics.txt --CREATE_INDEX true --TMP_DIR /tmp --VALIDATION_STRINGENCY SILENT