brentp / excord

extract SV signal from a BAM
Apache License 2.0
11 stars 1 forks source link

Excord runtime/mem resources #3

Open Phillip-a-richmond opened 2 years ago

Phillip-a-richmond commented 2 years ago

Hey Brent,

I'm looking to run Excord on 1kG data from the NYU remap to GRCh38+alt.

Wondering 2 questions:

  1. Can Excord work with streamed CRAMs?
  2. Roughly how long and how much RAM is needed for Excord, assuming I'm running a code chunk like this?
samtools view -b $CRAM_Dir/$Sample_ID.cram -T $Fasta_Dir/$Fasta_File | \
        $CRAM_Dir/excord \
        --discordantdistance 500 \
        --fasta $Fasta_Dir/$Fasta_File \
        /dev/stdin \
        | LC_ALL=C sort --buffer-size 2G -k1,1 -k2,2n -k3,3n \
        | bgzip -c > $CRAM_Dir/$Sample_ID.bed.gz

I'm trying to choose what kind of machines to run this on to be most efficient with the cloud resources. Essentially I'm pulling GRCh38 CRAM from S3-->onboard NVMe, then running this, saving output, and then I'll aggregate with stix over however many samples I can afford to run (assuming ~$3k).

Thanks, Phil

brentp commented 2 years ago

Hi Phil, It has been a while, but I would think that 4G would be more than enough. Maybe @ryanlayer has more recent results on this.

brentp commented 2 years ago

... and yes, it can work with streamed crams. samtools view -u might be a bit faster.

Phillip-a-richmond commented 2 years ago

On a single thread can you ballpark wall-time needed to run Excord on a 30x CRAM?

Phillip-a-richmond commented 2 years ago

And if I fed extra threads to samtools view with -@ $Threads would that improve speed for Excord?

brentp commented 2 years ago

I would guess 1.5 hours, but again, @ryanlayer might have more input. I think that excord would likely be the bottleneck so I don't think the additional samtools view threads would help.

Phillip-a-richmond commented 2 years ago

Ok thanks! Very useful. I'll run some tests before scaling.

Cool tool and paper, hopefully can get closer to rare SVs in rare disease context with this approach.