Running griffin_GC_counts.py on a high coverage bam

adoebley / Griffin

A flexible framework for nucleosome profiling of cell-free DNA

Other

24 stars 16 forks source link

Running griffin_GC_counts.py on a high coverage bam #2

Closed shahmj closed 2 years ago

shahmj commented 2 years ago

Hi,

I'm wondering if there is a way to speed up getting GC counts on a high coverage bam. I have a plasma bam file that is 30-40x mean coverage and would like to check if there is GC bias by fragment size. Is there a way to split by chromosome or some other way to speed up the process?

Thanks, Minita

adoebley commented 2 years ago

Hi Minita,

If you're using a slurm workload manager, you can parallelize the process by specifying >1 CPU for the GC_counts step in 'cluster_slurm.yaml'. The default is 8 CPU.

https://github.com/adoebley/Griffin/blob/796b059b791875eca9f004dcbf1cbb0e928a9e92/snakemakes/griffin_GC_correction/config/cluster_slurm.yaml#L16-L19

When I've worked with ~30-35x bams, they take 7-8 hours to run with 8CPU.

Best, Anna-Lisa

shahmj commented 2 years ago

Great, thanks!