Closed ggoodstudydaydayup closed 2 years ago
can you show the full stderr and stdout when you run smoove? make sure to use a clean --outdir.
Here is the full stderr and stdout: "[smoove] 2022/04/06 09:46:18 starting with version 0.2.7 [smoove] 2022/04/06 09:46:18 calculating bam stats for 1 bams [smoove] 2022/04/06 09:46:25 done calculating bam stats [smoove]: ([E]lumpy-filter) 2022/04/06 09:52:17 [lumpy_filter] extracted splits and discordants from 292002619 total aligned reads [smoove]:2022/04/06 09:52:25 finished process: lumpy-filter (set -eu; lumpy_filter -f /data/liuyu/Script/SV/Fei_lab/SV/SV_calling/Msie_Msyl/ref/Msyl_Rfor_compare) in user-time:11m29.628743s system-time:49.598735s [smoove] 2022/04/06 10:10:12 removed 110332034 alignments out of 290199195 (38.02%) with low mapq, depth > 1000, or from excluded chroms from GD.disc.bam in 1068 seconds [smoove] 2022/04/06 10:10:12 removed 9147921 alignments out of 290199195 (3.15%) that were bad interchromosomals or flanked-splitters from GD.disc.bam [smoove] 2022/04/06 10:41:00 kept 841340 putative orphans [smoove] 2022/04/06 10:41:00 removed 598349 discordant orphans in 935 seconds [smoove] 2022/04/06 11:00:55 removed 20509144 singletons and isolated interchromosomals of 170719240 reads (12.01%) from GD.disc.bam in 3043 seconds [smoove] 2022/04/06 11:00:55 150210096 reads (51.76%) of the original 290199195 remain from GD.disc.bam [smoove] 2022/04/06 11:02:45 removed 6909921 alignments out of 11301778 (61.14%) with low mapq, depth > 1000, or from excluded chroms from GD.split.bam in 76 seconds [smoove] 2022/04/06 11:02:45 removed 1248549 alignments out of 11301778 (11.05%) that were bad interchromosomals or flanked-splitters from GD.split.bam [smoove] 2022/04/06 11:04:41 kept 457383 putative orphans [smoove] 2022/04/06 11:04:41 removed 34566 split orphans in 78 seconds [smoove] 2022/04/06 11:04:58 removed 1099985 singletons of 3143308 reads (34.99%) from GD.split.bam in 133 seconds [smoove] 2022/04/06 11:04:58 2043323 reads (18.08%) of the original 11301778 remain from GD.split.bam [smoove] 2022/04/06 11:04:59 starting lumpy [smoove] 2022/04/06 11:04:59 wrote lumpy command to results-smoove//GD-lumpy-cmd.sh [smoove] 2022/04/06 11:04:59 writing sorted, indexed file to results-smoove/GD-smoove.genotyped.vcf.gz [smoove] 2022/04/06 11:04:59 excluding variants with all unknown or homozygous reference genotypes [smoove] 2022/04/06 11:04:59 missing pair end parameters:mean stdev Program: ** (v 0.2.13) Author: Ryan Layer (rl6sf@virginia.edu) Summary: Find structural variations in various signals.
Usage: ** [OPTIONS]
Options:
-g Genome file (defines chromosome order)
[smoove] 2022/04/06 11:04:59 -e Show evidence for each call
-w File read windows size (default 1000000)
-mw minimum weight for a call
-msw minimum per-sample weight for a call
-tt trim threshold
-x exclude file bed file
-t temp file prefix, must be to a writeable directory
-P output probability curve for each variant
-b output BEDPE instead of VCF
-sr bam_file:
-pe bam_file:<file name>,
id:<sample name>,
histo_file:<file name>,
mean:<value>,
stdev:<value>,
read_length:<length>,
min_non_overlap:<length>,
discordant_z:<z value>,
back_distance:<distance>,
min_mapping_threshold:<mapping quality>,
weight:<sample weight>,
read_group:<string>
-bedpe bedpe_file:<bedpe file>,
id:<sample name>,
weight:<sample weight>
[smoove] 2022/04/06 11:05:01 > gsort version 0.1.4 [smoove] 2022/04/06 11:05:07 panic: runtime error: invalid memory address or nil pointer dereference [signal SIGSEGV: segmentation violation [smoove] 2022/04/06 11:05:07 code=0x1 addr=0x8 pc=0x546d43] " And the lumpy-cmd.sh like this : "set -euo pipefail set -euo pipefail; lumpy -msw 4 -mw 4 -t $(mktemp) -tt 0 -P -pe id:GD,bam_file:results-smoove//GD.disc.bam,histo_file:results-smoove//GD.histo,mean:0.00,stdev:0.00,read_length:150,min_non_overlap:150,discordant_z:2.75,back_distance:30,weight:1,min_mapping_threshold:20 -sr id:GD,bam_file:results-smoove//GD.split.bam,back_distance:10,weight:1,min_mapping_threshold:20"
Thanks in advance.
are the reads pairs? can you share the output of samtools flagstat?
Thanks for your reply. We used pair-end reads. Here is the result of samtools flagstat: 300886963 + 0 in total (QC-passed reads + QC-failed reads) 6062867 + 0 secondary 0 + 0 supplementary 0 + 0 duplicates 292002619 + 0 mapped (97.05% : N/A) 294824096 + 0 paired in sequencing 147412048 + 0 read1 147412048 + 0 read2 0 + 0 properly paired (0.00% : N/A) 284284226 + 0 with itself and mate mapped 1655526 + 0 singletons (0.56% : N/A) 81845926 + 0 with mate mapped to a different chr 23920134 + 0 with mate mapped to a different chr (mapQ>=5)
somehow you have 0 properly paired reads. that's the problem. smoove and lumpy use those to determine the expected range of insert sizes.
I'm not sure why the command line of lumpy could be used to create "histo" file. samtools view -r ${sample} ${sample}.bam | tail -n+100000 | pairend_distro.py -r 150 -X 4 -N 10000 -o sample.lib1.histo Is that right?
I don't know about that script. If you want to use smoove, you'll need a bam file that has some reads with a flag indicating proper pair. There's likely something wrong with your data, so I wouldn't try to circumvent this.
Thanks!
Hello @brentp,
I ran smoove like this --- smoove call --outdir results-smoove/ --name $sample --fasta $reference -p 1 --genotype ../${sample}.bam
For each BAM file, I have a "disc.bam", "disc.bam.bai", "disc.bam.orig.bam" and a "histo" file created. However, the histo files are empty. I check the bam file and there was nothing wrong. Meanwhile, when I ran like this --- samtools view -r ${sample} ${sample}.bam \ | tail -n+100000 \ | pairend_distro.py \ -r 150 \ -X 4 \ -N 10000 \ -o sample.lib1.histo The "histo" file was created and the mean value was calculated.
Could you please give me some advice to solve this problem?
Best wishes, Liu