AstraZeneca-NGS / VarDictJava

VarDict Java port
MIT License
128 stars 57 forks source link

WGS non-human data -- taking a long time to call a single chromosome #370

Open RNAseqer opened 2 years ago

RNAseqer commented 2 years ago

Hi there,

I'm currently running vardict on the Terra cloud and I'm running each chromosome in parallel to one another. Unfortunately, I'm running into a problem with a single chromosome where the run doesn't quite fail out but has a out of memory issue for that particular chromosome (1 out of 39). All of the other chromosomes are completing fairly quick.

Here's the code that I'm using:

export JAVA_OPTS="-Xmx42G -XX:ParallelGCThreads=1"
vardict-java \
-th 1 \
-G /cromwell_root/gc_bucket/Canis_lupus_familiaris_assembly3_2019.fasta \
--fisher \
-N CCB030088-T \
-b "/cromwell_root/gc_bucket/4cdacb94-f618-47b8-9664-09f4799168ed/PreProcessingForVariantDiscovery_GATK4/eb8db70c-bd2a-4662-9b90-1b924f66375f/call-GatherBamFiles/CCB030088-T.bam|/cromwell_root/gc_bucket/d1a910ee-8e6e-40b7-b157-412ffc19e50a/PreProcessingForVariantDiscovery_GATK4/dce66539-49d2-4686-9d15-88007a7dbf77/call-GatherBamFiles/CCB030088-N.bam" \
 \
-c 1 \
-S 2 \
-E 3 \
/cromwell_root/gc_bucket/chr3_2.bed | \
var2vcf_paired.pl \
-N "CCB030088-T|CCB030088-N" \
 \
-M \
-A \
-Q 20.0 \
-d 8 \
-v 4 \
-f 0.02 \

Anyways, any suggestions would be very helpful. I've attempted to run a few samples and it seems to be a consistent issue in chr3 region. Also, I have split up the bed file as recommended into 50 spacing with 150 overlap.

Thank you for your help!

RNAseqer commented 2 years ago

Hi again, I ended up looking more closely at the chromosome and it looks like it was a tandem repeat. Do you suggest removing those regions? It was fairly short but caused the job to never finish.