Open robertzeibich opened 10 months ago
I would suggest to use the duphold values like DHFFC for filtering rather than SHQ. But if you do want to use it, can you share a variant so I can see what you mean?
Sure
chr1 1666979 3 N <DEL> . . SVTYPE=DEL;STRANDS=+-:6;SVLEN=-202;END=1667181;CIPOS=-30,196;CIEND=-201,29;CIPOS95=-5,41;CIEND95=-70, 4;IMPRECISE;SU=6;PE=6;SR=0;PRPOS=1.94054e-09,3.5228e-09,6.39347e-09,1.15977e-08,2.10337e-08,3.81036e-08,6.8987e-08,1.24837e-07,2.25845e-07,4.08063e-07,7.3695 e-07,1.33009e-06,2.39973e-06,4.32612e-06,7.79666e-06,1.40352e-05,2.52488e-05,4.54019e-05,8.1568e-05,0.00014643,0.000262604,0.000470833,0.000843363,0.00150982 ,0.00270086,0.00482733,0.00861743,0.0153798,0.0274186,0.0359165,0.0470137,0.045247,0.0434684,0.0417257,0.0400227,0.0383638,0.0367318,0.0351409,0.0335678,0.03 20544,0.0305828,0.0291511,0.0277528,0.0263923,0.0250728,0.0238056,0.0225849,0.0213844,0.0202263,0.0191107,0.0180412,0.0170062,0.0160146,0.0150715,0.0141742,0 .0133196,0.0124999,0.0117177,0.0109715,0.0102595,0.00958614,0.00894291,0.00833596,0.00775814,0.00721409,0.00670673,0.00622937,0.00577541,0.0053489,0.00494703 ,0.00456942,0.00421845,0.00388762,0.00357949,0.00328931,0.00302315,0.00277422,0.00254228,0.00232682,0.00212812,0.00194336,0.00177233,0.00161485,0.00147069,0. 00133636,0.00121358,0.00110055,0.000997179,0.000901402,0.000814204,0.000734701,0.000662464,0.000596335,0.000535921,0.000481279,0.000431895,0.000387482,0.0003 4682,0.000310174,0.000277093,0.000247184,0.000220231,0.000196202,0.000174509,0.000154848,0.000137414,0.000122082,0.0001082,9.58199e-05,8.45953e-05,7.47519e-0 5,6.58245e-05,5.78675e-05,5.08174e-05,4.46069e-05,3.90589e-05,3.41848e-05,2.99071e-05,2.61335e-05,2.2817e-05,1.98987e-05,1.73055e-05,1.5063e-05,1.30982e-05,1 .13761e-05,9.8337e-06,8.52081e-06,7.38187e-06,6.40051e-06,5.52535e-06,4.76828e-06,4.10732e-06,3.54353e-06,3.04909e-06,2.61766e-06,2.24226e-06,1.9234e-06,1.64 77e-06,1.4075e-06,1.20104e-06,1.02558e-06,8.74974e-07,7.46385e-07,6.35902e-07,5.40234e-07,4.573e-07,3.87769e-07,3.28302e-07,2.77978e-07,2.3432e-07,1.98266e-0 7,1.66696e-07,1.4069e-07,1.18052e-07,9.95003e-08,8.35258e-08,7.00463e-08,5.86491e-08,4.92874e-08,4.13363e-08,3.4389e-08,2.87651e-08,2.39753e-08,1.99732e-08,1 .65566e-08,1.38008e-08,1.1481e-08,9.53977e-09,7.91739e-09,6.61129e-09,5.50405e-09,4.5665e-09,3.78319e-09,3.14497e-09,2.59712e-09,2.14727e-09,1.77158e-09,1.46 583e-09,1.20275e-09,9.93711e-10,8.17587e-10,6.73413e-10,5.52689e-10,4.5296e-10,3.7154e-10,3.03631e-10,2.47954e-10,2.01176e-10,1.64171e-10,1.34023e-10,1.093e- 10,8.91016e-11,7.2747e-11,5.91578e-11,4.802e-11,3.89415e-11,3.15325e-11,2.52383e-11,2.04493e-11,1.63367e-11,1.30483e-11,1.03707e-11,8.37709e-12,6.66609e-12,5 .27686e-12,4.21012e-12,3.37785e-12,2.67151e-12,2.11162e-12,1.67425e-12,1.34876e-12,1.05594e-12,8.36908e-13,6.58145e-13,5.23885e-13,4.09982e-13,3.21031e-13,2. 49168e-13,1.92059e-13,1.47045e-13,1.10975e-13,8.47616e-14,6.20617e-14,4.58748e-14,3.32256e-14,2.4066e-14,1.68237e-14;PREND=1.16493e-11,1.43876e-11,1.77304e-1 1,2.20113e-11,2.72982e-11,3.37463e-11,4.15877e-11,5.16645e-11,6.37343e-11,7.89015e-11,9.72796e-11,1.20625e-10,1.48531e-10,1.8275e-10,2.25798e-10,2.78345e-10, 3.42522e-10,4.2006e-10,5.1739e-10,6.33153e-10,7.79946e-10,9.54462e-10,1.17288e-09,1.43547e-09,1.75632e-09,2.14963e-09,2.63445e-09,3.2164e-09,3.92185e-09,4.76 661e-09,5.80303e-09,7.05294e-09,8.5846e-09,1.04416e-08,1.26696e-08,1.53493e-08,1.86427e-08,2.25572e-08,2.72769e-08,3.28591e-08,3.96665e-08,4.78175e-08,5.7585 e-08,6.9084e-08,8.2999e-08,9.95299e-08,1.19744e-07,1.43297e-07,1.71332e-07,2.04909e-07,2.44869e-07,2.92213e-07,3.48265e-07,4.14439e-07,4.92853e-07,5.85911e-0 7,6.95093e-07,8.22937e-07,9.72466e-07,1.14875e-06,1.35545e-06,1.59336e-06,1.8757e-06,2.20108e-06,2.57914e-06,3.01686e-06,3.52833e-06,4.12642e-06,4.81258e-06, 5.59727e-06,6.51014e-06,7.55983e-06,8.75836e-06,1.01446e-05,1.17354e-05,1.35336e-05,1.56008e-05,1.79545e-05,2.06578e-05,2.37024e-05,2.71487e-05,3.10711e-05,3 .54895e-05,4.04818e-05,4.60689e-05,5.23417e-05,5.9421e-05,6.73189e-05,7.60653e-05,8.58397e-05,9.67171e-05,0.000108855,0.000122299,0.000137177,0.000153638,0.0 00171694,0.000191611,0.000213744,0.000237718,0.000264086,0.000292625,0.00032401,0.000358132,0.000395333,0.000435559,0.000479081,0.000526007,0.000577183,0.000 632179,0.000691364,0.000754708,0.000822328,0.000894761,0.000972364,0.00105493,0.00114296,0.00123648,0.00133576,0.00144115,0.00155392,0.00167222,0.00179607,0. 00192669,0.00206393,0.0022089,0.00235792,0.00251429,0.00267896,0.00285022,0.00302834,0.00321272,0.00340377,0.00360188,0.0038082,0.00402021,0.00423954,0.00446 41,0.00469728,0.00493775,0.00518361,0.00543329,0.00568866,0.00595346,0.00622102,0.00649151,0.00676985,0.00705192,0.00733642,0.00762634,0.00791954,0.00821555, 0.00851802,0.00882087,0.00912685,0.00943192,0.00974282,0.0100556,0.0103661,0.010679,0.0109917,0.0113052,0.0116203,0.0119356,0.012243,0.0125527,0.0128624,0.01 31705,0.0134696,0.0137714,0.0140726,0.0143717,0.014665,0.0149567,0.0152453,0.0155329,0.0158114,0.0160891,0.0163606,0.0166273,0.0168918,0.017155,0.0174102,0.0 176603,0.0179087,0.0181521,0.0183899,0.0186198,0.0188457,0.0190677,0.0192878,0.0195002,0.0197075,0.019912,0.0201097,0.0203043,0.0204951,0.0206781,0.0208558,0 .0210314,0.0212025,0.0213654,0.0215219,0.0159421,0.0118049,0.00642711,0.00349906,0.00190448,0.00103648,0.000563912,0.000306741,0.000166833,9.07308e-05,4.9334 6e-05,2.68181e-05,1.45728e-05,5.82464e-06,2.32765e-06,9.29999e-07,3.71552e-07,1.48367e-07,4.35781e-08,1.2799e-08,3.75884e-09,1.10367e-09,3.2406e-10,9.51338e- 11,2.7928e-11,8.19759e-12,2.40606e-12,7.06057e-13,2.07181e-13;MSHQ=-1;smoove_gene=SLC35E2B|gene:1:203 GT:SU:PE:SR:SHQ ./.:6:6:0:-1
Based on the smoove instruction, I was hoping to get different values:
Hi, your sample is not HET so the value is -1. But smoove also requires other fields: https://github.com/brentp/smoove/blob/master/annotate/annotate.go#L349-L377
you probably need to run smoove call with the --genotype
flag.
I used the genotype flag. These are the exact commands I used:
smoove call --outdir ${outdir} --name ${SAMPLE} --fasta ${FASTA} -p $(nproc) --genotype ${BAM}
sh ${outdir}/${SAMPLE}-lumpy-cmd.sh | bgzip -c > ${outdir}/${SAMPLE}.vcf.gz
smoove annotate --gff ${GFF} ${outdir}/${SAMPLE}.vcf.gz | bgzip -c > ${outdir}/${SAMPLE}.annotated.vcf.gz
We also tried to run the docker container on our HPC cluster, but we are receiving the following error:
module load singularity singularity build smoove.sif docker://brentp/smoothe INFO: Starting build... FATAL: While performing build: conveyor failed to get: reading manifest latest in docker.io/brentp/smoothe: errors: denied: requested access to the resource is denied unauthorized: authentication required
I am assuming that the docker container might provide me with different results, but I might be mistaken here.
you have "smoothe" instead of "smoove"
you only need to run smoove call --genotype
do not run the lumpy_cmd.sh, that is run by smoove and when you run it, you overwrite the output of smoove.
Thanks for your replies. I am currently rerunning smoove. I will keep you posted on my progress.
I ran lumpy-cmd.sh in the past because I could not find a vcf outpuf file anywhere.
smoove call --outdir /scratch/xm41/ct/GREP_ANALYSIS/30x/SV/SMOOVE_RERUN/GRMH001 --name GRMH001 --fasta /scratch/xm41/hg38_resources/resources_broad_hg38_v0_Homo_sapiens_assembly38.fasta -p 1 --genotype /scratch/xm41/ct/bams/GRMH001.analy
sis.ready.bam
[smoove] 2023/08/11 18:09:55 starting with version 0.2.7
[smoove] 2023/08/11 18:09:56 calculating bam stats for 1 bams
[smoove] 2023/08/11 18:10:16 done calculating bam stats
[smoove]: ([E]lumpy-filter) 2023/08/11 20:33:34 [lumpy_filter] extracted splits and discordants from 1546746110 total aligned reads
[smoove]:2023/08/11 20:33:36 finished process: lumpy-filter (set -eu; lumpy_filter -f /scratch/xm41/hg38_resources/resources_broad_hg38_v0_Homo_sapiens_assembly3) in user-time:58m29.343551s system-time:10m27.852504s
[smoove] 2023/08/11 20:40:54 removed 11150202 alignments out of 43897434 (25.40%) with low mapq, depth > 1000, or from excluded chroms from GRMH001.disc.bam in 437 seconds
[smoove] 2023/08/11 20:40:54 removed 957442 alignments out of 43897434 (2.18%) that were bad interchromosomals or flanked-splitters from GRMH001.disc.bam
[smoove] 2023/08/11 21:01:07 kept 81974 putative orphans
[smoove] 2023/08/11 21:01:07 removed 11530441 discordant orphans in 1061 seconds
[smoove] 2023/08/11 21:03:10 removed 27214394 singletons and isolated interchromosomals of 31789790 reads (85.61%) from GRMH001.disc.bam in 1337 seconds
[smoove] 2023/08/11 21:03:10 4575396 reads (10.42%) of the original 43897434 remain from GRMH001.disc.bam
[smoove] 2023/08/11 21:04:06 removed 1417613 alignments out of 2787064 (50.86%) with low mapq, depth > 1000, or from excluded chroms from GRMH001.split.bam in 52 seconds
[smoove] 2023/08/11 21:04:06 removed 129065 alignments out of 2787064 (4.63%) that were bad interchromosomals or flanked-splitters from GRMH001.split.bam
[smoove] 2023/08/11 21:05:10 kept 100569 putative orphans
[smoove] 2023/08/11 21:05:10 removed 166204 split orphans in 51 seconds
[smoove] 2023/08/11 21:05:16 removed 611375 singletons of 1240386 reads (49.29%) from GRMH001.split.bam in 70 seconds
[smoove] 2023/08/11 21:05:16 629011 reads (22.57%) of the original 2787064 remain from GRMH001.split.bam
[smoove] 2023/08/11 21:05:17 starting lumpy
[smoove] 2023/08/11 21:05:17 wrote lumpy command to /scratch/xm41/ct/GREP_ANALYSIS/30x/SV/SMOOVE_RERUN/GRMH001/GRMH001-lumpy-cmd.sh
[smoove] 2023/08/11 21:05:17 gsort and bcftools required for svtyper
These are the current output files:
-rw-rw-rw-+ 1 rzei0002 xm41 138M Aug 11 21:03 GRMH001.disc.bam
-rw-rw-rw-+ 1 rzei0002 xm41 650K Aug 11 21:03 GRMH001.disc.bam.csi
-rw-rw-rw-+ 1 rzei0002 xm41 2.1G Aug 11 20:33 GRMH001.disc.bam.orig.bam
-rw-rw-rw-+ 1 rzei0002 xm41 6.8K Aug 11 18:10 GRMH001.histo
-rw-rw-rw-+ 1 rzei0002 xm41 544 Aug 11 21:05 GRMH001-lumpy-cmd.sh
-rw-rw-rw-+ 1 rzei0002 xm41 33M Aug 11 21:05 GRMH001.split.bam
-rw-rw-rw-+ 1 rzei0002 xm41 1.1M Aug 11 21:05 GRMH001.split.bam.csi
-rw-rw-rw-+ 1 rzei0002 xm41 279M Aug 11 20:33 GRMH001.split.bam.orig.bam
What am I doing wrong?
gsort and bcftools required for svtyper
so it is not able to genotype. you must install those.
what is the best way to install smoove? I tried conda install -c bioconda smoove
, but I am running into the previous mentioned svtyper and bcftools errors and I do not know how to fix them. Should I download the latest release and then install the required packages in a separate conda environment?
` [ ] bgzip [ sort -> (compress) -> index ] [ ] gsort [(sort) -> compress -> index ] [ ] tabix [ sort -> compress -> (index)] [ ] lumpy [ ] lumpy_filter [ ] samtools [ ] svtyper [ ] mosdepth [extra filtering of split and discordant files for better scaling]
[ ] duphold [(optional) annotate calls with depth changes] [ ] svtools [only needed for large cohorts]. `
For everyone who struggles with the installation:
conda create -n smoove
conda install -c bioconda smoove
conda install svtyper=0.7.0
RESULT
(smoove) [rzei0002@m3-login1 tests]$ smoove call -o to --processes 2 -x --genotype --fasta $REF --name NA24385-svs-cram --excludechroms '~^GL,~^HLA,~_random,~^chrUn,~alt,~decoy' NA24385_chr1.cram
[smoove] 2023/08/15 19:26:41 starting with version 0.2.8
[smoove] 2023/08/15 19:26:41 calculating bam stats for 1 bams
[smoove] 2023/08/15 19:26:42 done calculating bam stats
[smoove] 2023/08/15 19:26:42 removed 0 alignments out of 7626 (0.00%) with low mapq, depth > 1000, or from excluded chroms from NA24385_chr1.disc.bam in 0 seconds
[smoove] 2023/08/15 19:26:42 removed 0 alignments out of 7626 (0.00%) that were bad interchromosomals or flanked-splitters from NA24385_chr1.disc.bam
[smoove] 2023/08/15 19:26:42 removed 0 alignments out of 6039 (0.00%) with low mapq, depth > 1000, or from excluded chroms from NA24385_chr1.split.bam in 0 seconds
[smoove] 2023/08/15 19:26:42 removed 0 alignments out of 6039 (0.00%) that were bad interchromosomals or flanked-splitters from NA24385_chr1.split.bam
[smoove] 2023/08/15 19:26:42 kept 0 putative orphans
[smoove] 2023/08/15 19:26:42 removed 0 discordant orphans in 0 seconds
[smoove] 2023/08/15 19:26:42 kept 24 putative orphans
[smoove] 2023/08/15 19:26:42 removed 4 split orphans in 0 seconds
[smoove] 2023/08/15 19:26:42 removed 0 singletons and isolated interchromosomals of 7626 reads (0.00%) from NA24385_chr1.disc.bam in 0 seconds
[smoove] 2023/08/15 19:26:42 7626 reads (100.00%) of the original 7626 remain from NA24385_chr1.disc.bam
[smoove] 2023/08/15 19:26:42 removed 0 singletons of 6039 reads (0.00%) from NA24385_chr1.split.bam in 0 seconds
[smoove] 2023/08/15 19:26:42 6039 reads (100.00%) of the original 6039 remain from NA24385_chr1.split.bam
[smoove] 2023/08/15 19:26:42 starting lumpy
[smoove] 2023/08/15 19:26:42 wrote lumpy command to to/NA24385-svs-cram-lumpy-cmd.sh
[smoove] 2023/08/15 19:26:42 writing sorted, indexed file to to/NA24385-svs-cram-smoove.genotyped.vcf.gz
[smoove] 2023/08/15 19:26:42 excluding variants with all unknown or homozygous reference genotypes
[smoove] 2023/08/15 19:26:42 616 0
[smoove] 2023/08/15 19:26:42 1 1000000
[smoove] 2023/08/15 19:26:42 > gsort version 0.1.4
[smoove] 2023/08/15 19:26:51 wrote sorted, indexed file to to/NA24385-svs-cram-smoove.genotyped.vcf.gz
using smoove 0.2.7
GFF="Homo_sapiens.GRCh38.105.gff3.gz" # http://ftp.ensembl.org/pub/release-105/gff3/homo_sapiens/Homo_sapiens.GRCh38.105.gff3.gz
smoove annotate --gff ${GFF} ${outdir}/${SAMPLE}.vcf.gz | bgzip -c > ${outdir}/${SAMPLE}.annotated.vcf.gz"
Do you know why I just receive for MSHQ/SHQ -1? I am just using a single sample. Therefore, I asuume MSHQ and SHQ should be the same.
I will try the docker version next, but I am not sure if that will fix my issue.
Currently, I am using the RP+SR values in the VCF, but filtering on the calculated sum does not make sense.![image](https://github.com/brentp/smoove/assets/73748542/533fde33-d195-4cf0-b5e6-09fec5cc819e)
My goal is to filter for high quality deletions.
Help is much appreciated.