gagneurlab / drop

Pipeline to find aberrant events in RNA-Seq data, useful for diagnosis of rare disorders
MIT License
130 stars 43 forks source link

OOM error in markDuplicates #519

Closed gianfilippo closed 7 months ago

gianfilippo commented 7 months ago

Hi,

I am trying to run the MAE and rnaVariantCalling and I am getting a OOM error in markDuplicates (see below).

I am submitting this as a slurm job and I allocated 10 cores and 180Gb for the last run. I do not recall (I may be wrong) having to allocate more memory when running a GATK based pipeline for RNAseq data. Should I just allocate more memory or use the config file to manage it ?

Thanks

Error in rule markDuplicates: jobid: 247 input: DROP/Analysis_60M_60F_ExternalSamples/processed_data/rnaVariantCalling/out/bam/661T/661T_Aligned.sortedByCoord.out.bam, DROP/Analysis_60M_60F_ExternalSamples/processed_data/rnaVariantCalling/out/bam/661T/661T_Aligned.sortedByCoord.out.bam.bai output: DROP/Analysis_60M_60F_ExternalSamples/processed_data/rnaVariantCalling/out/bam/661T/661T_Aligned.sortedByCoord.dupMarked.out.bam, DROP/Analysis_60M_60F_ExternalSamples/processed_data/rnaVariantCalling/out/bam/661T/661T_Aligned.sortedByCoord.dupMarked.out.bai log: DROP/Analysis_60M_60F_ExternalSamples/processed_data/rnaVariantCalling/logs/markDuplicates/661T.log (check log file(s) for error details) shell:

    gatk MarkDuplicates         -I DROP/Analysis_60M_60F_ExternalSamples/processed_data/rnaVariantCalling/out/bam/661T/661T_Aligned.sortedByCoord.out.bam -O DROP/Analysis_60M_60F_ExternalSamples/processed_data/rnaVariantCalling/out/bam/661T/661T_Aligned.sortedByCoord.dupMarked.out.bam         -M 

DROP/Analysis_60M_60F_ExternalSamples/processed_data/rnaVariantCalling/out/picard-tools-marked-dup-metrics.txt --CREATE_INDEX true --TMP_DIR "/tmp" --VALIDATION_STRINGENCY SILENT 2> DROP/Analysis_60M_60F_ExternalSamples/processed_data/rnaVariantCalling/logs/markDuplicates/661T.log

    (one of the commands exited with non-zero exit code; note that snakemake uses bash strict mode!)

Removing output files of failed job markDuplicates since they might be corrupted: DROP/Analysis_60M_60F_ExternalSamples/processed_data/rnaVariantCalling/out/bam/661T/661T_Aligned.sortedByCoord.dupMarked.out.bam, DROP/Analysis_60M_60F_ExternalSamples/processed_data/rnaVariantCalling/out/bam/661T/661T_Aligned.sortedByCoord.dupMarked.out.bai

Below is the end of the sample specific log file DROP/Analysis_60M_60F_ExternalSamples/processed_data/rnaVariantCalling/logs/markDuplicates/661T.log

INFO 2024-02-05 23:22:39 MarkDuplicates Sorting list of duplicate records. INFO 2024-02-05 23:22:42 MarkDuplicates After generateDuplicateIndexes freeMemory: 17094205248; totalMemory: 25199378432; maxMemory: 32178700288 INFO 2024-02-05 23:22:42 MarkDuplicates Marking 29761681 records as duplicates. INFO 2024-02-05 23:22:42 MarkDuplicates Found 3318 optical duplicate clusters. INFO 2024-02-05 23:22:42 MarkDuplicates Reads are assumed to be ordered by: coordinate INFO 2024-02-05 23:23:26 MarkDuplicates Written 10,000,000 records. Elapsed time: 00:00:44s. Time for last 10,000,000: 44s. Last read position: chr4:73,408,762 INFO 2024-02-05 23:24:10 MarkDuplicates Written 20,000,000 records. Elapsed time: 00:01:28s. Time for last 10,000,000: 44s. Last read position: chr8:108,203,053 INFO 2024-02-05 23:25:00 MarkDuplicates Written 30,000,000 records. Elapsed time: 00:02:18s. Time for last 10,000,000: 49s. Last read position: chr14:94,378,547 INFO 2024-02-05 23:25:44 MarkDuplicates Written 40,000,000 records. Elapsed time: 00:03:02s. Time for last 10,000,000: 43s. Last read position: chr19:58,355,146 INFO 2024-02-05 23:26:18 MarkDuplicates Written 50,000,000 records. Elapsed time: 00:03:36s. Time for last 10,000,000: 33s. Last read position: chrM:8,968 INFO 2024-02-05 23:26:40 MarkDuplicates Writing complete. Closing input iterator. INFO 2024-02-05 23:26:40 MarkDuplicates Duplicate Index cleanup. INFO 2024-02-05 23:26:40 MarkDuplicates Getting Memory Stats. INFO 2024-02-05 23:26:40 MarkDuplicates Before output close freeMemory: 288414536; totalMemory: 335544320; maxMemory: 32178700288 INFO 2024-02-05 23:26:40 MarkDuplicates Closed outputs. Getting more Memory Stats. INFO 2024-02-05 23:26:40 MarkDuplicates After output close freeMemory: 188927600; totalMemory: 234881024; maxMemory: 32178700288 [Mon Feb 05 23:26:40 EST 2024] picard.sam.markduplicates.MarkDuplicates done. Elapsed time: 9.16 minutes. Runtime.totalMemory()=234881024 Using GATK jar $HOME/.conda/envs/drop_env_133/share/gatk4-4.4.0.0-0/gatk-package-4.4.0.0-local.jar Running: java -Dsamjdk.use_async_io_read_samtools=false -Dsamjdk.use_async_io_write_samtools=true -Dsamjdk.use_async_io_write_tribble=false -Dsamjdk.compression_level=2 -jar $HOME/.conda/envs/drop_env_133/share/gatk4-4.4.0.0-0/gatk-package-4.4.0.0-local.jar MarkDuplicates -I DROP/Analysis_60M_60F_ExternalSamples/processed_data/rnaVariantCalling/out/bam/661T/661T_Aligned.sortedByCoord.out.bam -O DROP/Analysis_60M_60F_ExternalSamples/processed_data/rnaVariantCalling/out/bam/661T/661T_Aligned.sortedByCoord.dupMarked.out.bam -M DROP/Analysis_60M_60F_ExternalSamples/processed_data/rnaVariantCalling/out/picard-tools-marked-dup-metrics.txt --CREATE_INDEX true --TMP_DIR /tmp --VALIDATION_STRINGENCY SILENT

gianfilippo commented 7 months ago

UPDATE: I issued the same "gatk MarkDuplicates" command as in the log file, using an interactive node with only 32G of memory and it completed. May be the problem is with DROP/snakemake default settings for memory management, but I am not sure how to change that. Any suggestions ?

vyepez88 commented 7 months ago

Hi, I think there was an issue with the maskMultiVCF because a path couldn't be accessed. It is working now. It could have been that. 180 Gb for 10 samples should be more than enough. You could add specific resource allocations to the headers of the scripts.

gianfilippo commented 7 months ago

Hi, just completed a rerun and it worked. Thanks!