brentp / smoove

structural variant calling and genotyping with existing tools, but, smoothly.
Apache License 2.0
222 stars 21 forks source link

lumpy filter remove too much reads #205

Open yefanglee opened 1 year ago

yefanglee commented 1 year ago

Hi, when I ran the first code, the lumpy-filter removed more than 90% reads of my bam files. Are there something wrong with my original bam file? Is it possible to change the parameters of filtering? The process log is as follows: $ smoove call --outdir results/ --name ZWG7 --fasta GCA_001704415.1_ARS1_genomic.fna -p 1 --genotype ZWG7.sorted.uniqe.dedup.bam [smoove] 2022/10/10 09:20:44 starting with version 0.2.8 [smoove] 2022/10/10 09:20:50 calculating bam stats for 1 bams [smoove] 2022/10/10 09:21:54 done calculating bam stats [smoove]: 2022/10/10 09:26:57 finished process: lumpy-filter (set -eu; lumpy_filter -f /data/liyf/reference/GCA_001704415.1_ARS1_genomic.fna /data/liyf/data) in user-time:10m19.856767s system-time:46.940863s [smoove] 2022/10/10 09:49:14 removed 287488 alignments out of 2188411 (13.14%) with low mapq, depth > 1000, or from excluded chroms from ZWG7.disc.bam in 1337 seconds [smoove] 2022/10/10 09:49:14 removed 341630 alignments out of 2188411 (15.61%) that were bad interchromosomals or flanked-splitters from ZWG7.disc.bam [smoove] 2022/10/10 09:50:02 kept 8787 putative orphans [smoove] 2022/10/10 09:50:02 removed 499606 discordant orphans in 28 seconds [smoove] 2022/10/10 09:50:18 removed 1469059 singletons and isolated interchromosomals of 1559293 reads (94.21%) from ZWG7.disc.bam in 64 seconds [smoove] 2022/10/10 09:50:18 90234 reads (4.12%) of the original 2188411 remain from ZWG7.disc.bam [smoove] 2022/10/10 09:59:40 removed 16755 alignments out of 141102 (11.87%) with low mapq, depth > 1000, or from excluded chroms from ZWG7.split.bam in 560 seconds [smoove] 2022/10/10 09:59:41 removed 32891 alignments out of 141102 (23.31%) that were bad interchromosomals or flanked-splitters from ZWG7.split.bam [smoove] 2022/10/10 09:59:44 kept 777 putative orphans [smoove] 2022/10/10 09:59:44 removed 86 split orphans in 1 seconds [smoove] 2022/10/10 09:59:52 removed 88760 singletons of 91456 reads (97.05%) from ZWG7.split.bam in 11 seconds [smoove] 2022/10/10 09:59:52 2696 reads (1.91%) of the original 141102 remain from ZWG7.split.bam [smoove] 2022/10/10 09:59:57 starting lumpy [smoove] 2022/10/10 09:59:57 wrote lumpy command to results//ZWG7-lumpy-cmd.sh [smoove] 2022/10/10 09:59:57 writing sorted, indexed file to results/ZWG7-smoove.genotyped.vcf.gz [smoove] 2022/10/10 09:59:57 excluding variants with all unknown or homozygous reference genotypes

brentp commented 1 year ago

That's a bit high, but not unexpected. If all of those are left in, they'll result in spurious calls.

yefanglee commented 1 year ago

That's a bit high, but not unexpected. If all of those are left in, they'll result in spurious calls.

Thank for your reply.

yefanglee commented 1 year ago

That's a bit high, but not unexpected. If all of those are left in, they'll result in spurious calls.

Hi, I have a small question. How many samples could be merged together with the latest version of smoove (v0.2.8)? I am going to select ~500 high-depth samples (>15X) to call SVs, is it possible to merge all samples successfully?

brentp commented 1 year ago

Yes, 500 will probably work. It's simply using bcftools merge. Sometimes it can stall, but it's simple to merge the sample columns with a script if bcftools merge fails.

FarmOmics commented 1 year ago

I have same issue. It is possible to set up the MapQ and depth parameters by ourselves?

yefanglee commented 1 year ago

I have same issue. It is possible to set up the MapQ and depth parameters by ourselves?

I didn't set any parameters, just made all outfiles merged