brentp / smoove

structural variant calling and genotyping with existing tools, but, smoothly.
Apache License 2.0
222 stars 21 forks source link

When I use smoove annotate, I just receive a MSHQ/SHQ of -1 #225

Open robertzeibich opened 10 months ago

robertzeibich commented 10 months ago

using smoove 0.2.7

GFF="Homo_sapiens.GRCh38.105.gff3.gz" # http://ftp.ensembl.org/pub/release-105/gff3/homo_sapiens/Homo_sapiens.GRCh38.105.gff3.gz

smoove annotate --gff ${GFF} ${outdir}/${SAMPLE}.vcf.gz | bgzip -c > ${outdir}/${SAMPLE}.annotated.vcf.gz"

Do you know why I just receive for MSHQ/SHQ -1? I am just using a single sample. Therefore, I asuume MSHQ and SHQ should be the same.

I will try the docker version next, but I am not sure if that will fix my issue.

Currently, I am using the RP+SR values in the VCF, but filtering on the calculated sum does not make sense. image

My goal is to filter for high quality deletions.

Help is much appreciated.

brentp commented 10 months ago

I would suggest to use the duphold values like DHFFC for filtering rather than SHQ. But if you do want to use it, can you share a variant so I can see what you mean?

robertzeibich commented 10 months ago

Sure

chr1 1666979 3 N <DEL> . . SVTYPE=DEL;STRANDS=+-:6;SVLEN=-202;END=1667181;CIPOS=-30,196;CIEND=-201,29;CIPOS95=-5,41;CIEND95=-70, 4;IMPRECISE;SU=6;PE=6;SR=0;PRPOS=1.94054e-09,3.5228e-09,6.39347e-09,1.15977e-08,2.10337e-08,3.81036e-08,6.8987e-08,1.24837e-07,2.25845e-07,4.08063e-07,7.3695 e-07,1.33009e-06,2.39973e-06,4.32612e-06,7.79666e-06,1.40352e-05,2.52488e-05,4.54019e-05,8.1568e-05,0.00014643,0.000262604,0.000470833,0.000843363,0.00150982 ,0.00270086,0.00482733,0.00861743,0.0153798,0.0274186,0.0359165,0.0470137,0.045247,0.0434684,0.0417257,0.0400227,0.0383638,0.0367318,0.0351409,0.0335678,0.03 20544,0.0305828,0.0291511,0.0277528,0.0263923,0.0250728,0.0238056,0.0225849,0.0213844,0.0202263,0.0191107,0.0180412,0.0170062,0.0160146,0.0150715,0.0141742,0 .0133196,0.0124999,0.0117177,0.0109715,0.0102595,0.00958614,0.00894291,0.00833596,0.00775814,0.00721409,0.00670673,0.00622937,0.00577541,0.0053489,0.00494703 ,0.00456942,0.00421845,0.00388762,0.00357949,0.00328931,0.00302315,0.00277422,0.00254228,0.00232682,0.00212812,0.00194336,0.00177233,0.00161485,0.00147069,0. 00133636,0.00121358,0.00110055,0.000997179,0.000901402,0.000814204,0.000734701,0.000662464,0.000596335,0.000535921,0.000481279,0.000431895,0.000387482,0.0003 4682,0.000310174,0.000277093,0.000247184,0.000220231,0.000196202,0.000174509,0.000154848,0.000137414,0.000122082,0.0001082,9.58199e-05,8.45953e-05,7.47519e-0 5,6.58245e-05,5.78675e-05,5.08174e-05,4.46069e-05,3.90589e-05,3.41848e-05,2.99071e-05,2.61335e-05,2.2817e-05,1.98987e-05,1.73055e-05,1.5063e-05,1.30982e-05,1 .13761e-05,9.8337e-06,8.52081e-06,7.38187e-06,6.40051e-06,5.52535e-06,4.76828e-06,4.10732e-06,3.54353e-06,3.04909e-06,2.61766e-06,2.24226e-06,1.9234e-06,1.64 77e-06,1.4075e-06,1.20104e-06,1.02558e-06,8.74974e-07,7.46385e-07,6.35902e-07,5.40234e-07,4.573e-07,3.87769e-07,3.28302e-07,2.77978e-07,2.3432e-07,1.98266e-0 7,1.66696e-07,1.4069e-07,1.18052e-07,9.95003e-08,8.35258e-08,7.00463e-08,5.86491e-08,4.92874e-08,4.13363e-08,3.4389e-08,2.87651e-08,2.39753e-08,1.99732e-08,1 .65566e-08,1.38008e-08,1.1481e-08,9.53977e-09,7.91739e-09,6.61129e-09,5.50405e-09,4.5665e-09,3.78319e-09,3.14497e-09,2.59712e-09,2.14727e-09,1.77158e-09,1.46 583e-09,1.20275e-09,9.93711e-10,8.17587e-10,6.73413e-10,5.52689e-10,4.5296e-10,3.7154e-10,3.03631e-10,2.47954e-10,2.01176e-10,1.64171e-10,1.34023e-10,1.093e- 10,8.91016e-11,7.2747e-11,5.91578e-11,4.802e-11,3.89415e-11,3.15325e-11,2.52383e-11,2.04493e-11,1.63367e-11,1.30483e-11,1.03707e-11,8.37709e-12,6.66609e-12,5 .27686e-12,4.21012e-12,3.37785e-12,2.67151e-12,2.11162e-12,1.67425e-12,1.34876e-12,1.05594e-12,8.36908e-13,6.58145e-13,5.23885e-13,4.09982e-13,3.21031e-13,2. 49168e-13,1.92059e-13,1.47045e-13,1.10975e-13,8.47616e-14,6.20617e-14,4.58748e-14,3.32256e-14,2.4066e-14,1.68237e-14;PREND=1.16493e-11,1.43876e-11,1.77304e-1 1,2.20113e-11,2.72982e-11,3.37463e-11,4.15877e-11,5.16645e-11,6.37343e-11,7.89015e-11,9.72796e-11,1.20625e-10,1.48531e-10,1.8275e-10,2.25798e-10,2.78345e-10, 3.42522e-10,4.2006e-10,5.1739e-10,6.33153e-10,7.79946e-10,9.54462e-10,1.17288e-09,1.43547e-09,1.75632e-09,2.14963e-09,2.63445e-09,3.2164e-09,3.92185e-09,4.76 661e-09,5.80303e-09,7.05294e-09,8.5846e-09,1.04416e-08,1.26696e-08,1.53493e-08,1.86427e-08,2.25572e-08,2.72769e-08,3.28591e-08,3.96665e-08,4.78175e-08,5.7585 e-08,6.9084e-08,8.2999e-08,9.95299e-08,1.19744e-07,1.43297e-07,1.71332e-07,2.04909e-07,2.44869e-07,2.92213e-07,3.48265e-07,4.14439e-07,4.92853e-07,5.85911e-0 7,6.95093e-07,8.22937e-07,9.72466e-07,1.14875e-06,1.35545e-06,1.59336e-06,1.8757e-06,2.20108e-06,2.57914e-06,3.01686e-06,3.52833e-06,4.12642e-06,4.81258e-06, 5.59727e-06,6.51014e-06,7.55983e-06,8.75836e-06,1.01446e-05,1.17354e-05,1.35336e-05,1.56008e-05,1.79545e-05,2.06578e-05,2.37024e-05,2.71487e-05,3.10711e-05,3 .54895e-05,4.04818e-05,4.60689e-05,5.23417e-05,5.9421e-05,6.73189e-05,7.60653e-05,8.58397e-05,9.67171e-05,0.000108855,0.000122299,0.000137177,0.000153638,0.0 00171694,0.000191611,0.000213744,0.000237718,0.000264086,0.000292625,0.00032401,0.000358132,0.000395333,0.000435559,0.000479081,0.000526007,0.000577183,0.000 632179,0.000691364,0.000754708,0.000822328,0.000894761,0.000972364,0.00105493,0.00114296,0.00123648,0.00133576,0.00144115,0.00155392,0.00167222,0.00179607,0. 00192669,0.00206393,0.0022089,0.00235792,0.00251429,0.00267896,0.00285022,0.00302834,0.00321272,0.00340377,0.00360188,0.0038082,0.00402021,0.00423954,0.00446 41,0.00469728,0.00493775,0.00518361,0.00543329,0.00568866,0.00595346,0.00622102,0.00649151,0.00676985,0.00705192,0.00733642,0.00762634,0.00791954,0.00821555, 0.00851802,0.00882087,0.00912685,0.00943192,0.00974282,0.0100556,0.0103661,0.010679,0.0109917,0.0113052,0.0116203,0.0119356,0.012243,0.0125527,0.0128624,0.01 31705,0.0134696,0.0137714,0.0140726,0.0143717,0.014665,0.0149567,0.0152453,0.0155329,0.0158114,0.0160891,0.0163606,0.0166273,0.0168918,0.017155,0.0174102,0.0 176603,0.0179087,0.0181521,0.0183899,0.0186198,0.0188457,0.0190677,0.0192878,0.0195002,0.0197075,0.019912,0.0201097,0.0203043,0.0204951,0.0206781,0.0208558,0 .0210314,0.0212025,0.0213654,0.0215219,0.0159421,0.0118049,0.00642711,0.00349906,0.00190448,0.00103648,0.000563912,0.000306741,0.000166833,9.07308e-05,4.9334 6e-05,2.68181e-05,1.45728e-05,5.82464e-06,2.32765e-06,9.29999e-07,3.71552e-07,1.48367e-07,4.35781e-08,1.2799e-08,3.75884e-09,1.10367e-09,3.2406e-10,9.51338e- 11,2.7928e-11,8.19759e-12,2.40606e-12,7.06057e-13,2.07181e-13;MSHQ=-1;smoove_gene=SLC35E2B|gene:1:203 GT:SU:PE:SR:SHQ ./.:6:6:0:-1

Based on the smoove instruction, I was hoping to get different values:

image

brentp commented 10 months ago

Hi, your sample is not HET so the value is -1. But smoove also requires other fields: https://github.com/brentp/smoove/blob/master/annotate/annotate.go#L349-L377 you probably need to run smoove call with the --genotype flag.

robertzeibich commented 10 months ago

I used the genotype flag. These are the exact commands I used: smoove call --outdir ${outdir} --name ${SAMPLE} --fasta ${FASTA} -p $(nproc) --genotype ${BAM} sh ${outdir}/${SAMPLE}-lumpy-cmd.sh | bgzip -c > ${outdir}/${SAMPLE}.vcf.gz smoove annotate --gff ${GFF} ${outdir}/${SAMPLE}.vcf.gz | bgzip -c > ${outdir}/${SAMPLE}.annotated.vcf.gz

robertzeibich commented 10 months ago

We also tried to run the docker container on our HPC cluster, but we are receiving the following error:

module load singularity singularity build smoove.sif docker://brentp/smoothe INFO: Starting build... FATAL: While performing build: conveyor failed to get: reading manifest latest in docker.io/brentp/smoothe: errors: denied: requested access to the resource is denied unauthorized: authentication required

I am assuming that the docker container might provide me with different results, but I might be mistaken here.

brentp commented 10 months ago

you have "smoothe" instead of "smoove"

brentp commented 10 months ago

you only need to run smoove call --genotype do not run the lumpy_cmd.sh, that is run by smoove and when you run it, you overwrite the output of smoove.

robertzeibich commented 10 months ago

Thanks for your replies. I am currently rerunning smoove. I will keep you posted on my progress.

robertzeibich commented 10 months ago

I ran lumpy-cmd.sh in the past because I could not find a vcf outpuf file anywhere.

smoove call --outdir /scratch/xm41/ct/GREP_ANALYSIS/30x/SV/SMOOVE_RERUN/GRMH001 --name GRMH001 --fasta /scratch/xm41/hg38_resources/resources_broad_hg38_v0_Homo_sapiens_assembly38.fasta -p 1 --genotype /scratch/xm41/ct/bams/GRMH001.analy
sis.ready.bam
[smoove] 2023/08/11 18:09:55 starting with version 0.2.7
[smoove] 2023/08/11 18:09:56 calculating bam stats for 1 bams
[smoove] 2023/08/11 18:10:16 done calculating bam stats
[smoove]: ([E]lumpy-filter) 2023/08/11 20:33:34 [lumpy_filter] extracted splits and discordants from 1546746110 total aligned reads
[smoove]:2023/08/11 20:33:36 finished process: lumpy-filter (set -eu; lumpy_filter -f /scratch/xm41/hg38_resources/resources_broad_hg38_v0_Homo_sapiens_assembly3) in user-time:58m29.343551s system-time:10m27.852504s
[smoove] 2023/08/11 20:40:54 removed 11150202 alignments out of 43897434 (25.40%) with low mapq, depth > 1000, or from excluded chroms from GRMH001.disc.bam in 437 seconds
[smoove] 2023/08/11 20:40:54 removed 957442 alignments out of 43897434 (2.18%) that were bad interchromosomals or flanked-splitters from GRMH001.disc.bam
[smoove] 2023/08/11 21:01:07 kept 81974 putative orphans
[smoove] 2023/08/11 21:01:07 removed 11530441 discordant orphans in 1061 seconds
[smoove] 2023/08/11 21:03:10 removed 27214394 singletons and isolated interchromosomals of 31789790 reads (85.61%) from GRMH001.disc.bam in 1337 seconds
[smoove] 2023/08/11 21:03:10 4575396 reads (10.42%) of the original 43897434 remain from GRMH001.disc.bam
[smoove] 2023/08/11 21:04:06 removed 1417613 alignments out of 2787064 (50.86%) with low mapq, depth > 1000, or from excluded chroms from GRMH001.split.bam in 52 seconds
[smoove] 2023/08/11 21:04:06 removed 129065 alignments out of 2787064 (4.63%) that were bad interchromosomals or flanked-splitters from GRMH001.split.bam
[smoove] 2023/08/11 21:05:10 kept 100569 putative orphans
[smoove] 2023/08/11 21:05:10 removed 166204 split orphans in 51 seconds
[smoove] 2023/08/11 21:05:16 removed 611375 singletons of 1240386 reads (49.29%) from GRMH001.split.bam in 70 seconds
[smoove] 2023/08/11 21:05:16 629011 reads (22.57%) of the original 2787064 remain from GRMH001.split.bam
[smoove] 2023/08/11 21:05:17 starting lumpy
[smoove] 2023/08/11 21:05:17 wrote lumpy command to /scratch/xm41/ct/GREP_ANALYSIS/30x/SV/SMOOVE_RERUN/GRMH001/GRMH001-lumpy-cmd.sh
[smoove] 2023/08/11 21:05:17 gsort and bcftools required for svtyper

These are the current output files:

-rw-rw-rw-+ 1 rzei0002 xm41 138M Aug 11 21:03 GRMH001.disc.bam
-rw-rw-rw-+ 1 rzei0002 xm41 650K Aug 11 21:03 GRMH001.disc.bam.csi
-rw-rw-rw-+ 1 rzei0002 xm41 2.1G Aug 11 20:33 GRMH001.disc.bam.orig.bam
-rw-rw-rw-+ 1 rzei0002 xm41 6.8K Aug 11 18:10 GRMH001.histo
-rw-rw-rw-+ 1 rzei0002 xm41  544 Aug 11 21:05 GRMH001-lumpy-cmd.sh
-rw-rw-rw-+ 1 rzei0002 xm41  33M Aug 11 21:05 GRMH001.split.bam
-rw-rw-rw-+ 1 rzei0002 xm41 1.1M Aug 11 21:05 GRMH001.split.bam.csi
-rw-rw-rw-+ 1 rzei0002 xm41 279M Aug 11 20:33 GRMH001.split.bam.orig.bam

What am I doing wrong?

brentp commented 10 months ago
gsort and bcftools required for svtyper

so it is not able to genotype. you must install those.

robertzeibich commented 10 months ago

what is the best way to install smoove? I tried conda install -c bioconda smoove, but I am running into the previous mentioned svtyper and bcftools errors and I do not know how to fix them. Should I download the latest release and then install the required packages in a separate conda environment?

` [ ] bgzip [ sort -> (compress) -> index ] [ ] gsort [(sort) -> compress -> index ] [ ] tabix [ sort -> compress -> (index)] [ ] lumpy [ ] lumpy_filter [ ] samtools [ ] svtyper [ ] mosdepth [extra filtering of split and discordant files for better scaling]

[ ] duphold [(optional) annotate calls with depth changes] [ ] svtools [only needed for large cohorts]. `

robertzeibich commented 10 months ago

For everyone who struggles with the installation:

conda create -n smoove
conda install -c bioconda smoove
conda install svtyper=0.7.0

RESULT

(smoove) [rzei0002@m3-login1 tests]$ smoove call -o to --processes 2 -x --genotype --fasta $REF --name NA24385-svs-cram --excludechroms '~^GL,~^HLA,~_random,~^chrUn,~alt,~decoy' NA24385_chr1.cram
[smoove] 2023/08/15 19:26:41 starting with version 0.2.8
[smoove] 2023/08/15 19:26:41 calculating bam stats for 1 bams
[smoove] 2023/08/15 19:26:42 done calculating bam stats
[smoove] 2023/08/15 19:26:42 removed 0 alignments out of 7626 (0.00%) with low mapq, depth > 1000, or from excluded chroms from NA24385_chr1.disc.bam in 0 seconds
[smoove] 2023/08/15 19:26:42 removed 0 alignments out of 7626 (0.00%) that were bad interchromosomals or flanked-splitters from NA24385_chr1.disc.bam
[smoove] 2023/08/15 19:26:42 removed 0 alignments out of 6039 (0.00%) with low mapq, depth > 1000, or from excluded chroms from NA24385_chr1.split.bam in 0 seconds
[smoove] 2023/08/15 19:26:42 removed 0 alignments out of 6039 (0.00%) that were bad interchromosomals or flanked-splitters from NA24385_chr1.split.bam
[smoove] 2023/08/15 19:26:42 kept 0 putative orphans
[smoove] 2023/08/15 19:26:42 removed 0 discordant orphans in 0 seconds
[smoove] 2023/08/15 19:26:42 kept 24 putative orphans
[smoove] 2023/08/15 19:26:42 removed 4 split orphans in 0 seconds
[smoove] 2023/08/15 19:26:42 removed 0 singletons and isolated interchromosomals of 7626 reads (0.00%) from NA24385_chr1.disc.bam in 0 seconds
[smoove] 2023/08/15 19:26:42 7626 reads (100.00%) of the original 7626 remain from NA24385_chr1.disc.bam
[smoove] 2023/08/15 19:26:42 removed 0 singletons of 6039 reads (0.00%) from NA24385_chr1.split.bam in 0 seconds
[smoove] 2023/08/15 19:26:42 6039 reads (100.00%) of the original 6039 remain from NA24385_chr1.split.bam
[smoove] 2023/08/15 19:26:42 starting lumpy
[smoove] 2023/08/15 19:26:42 wrote lumpy command to to/NA24385-svs-cram-lumpy-cmd.sh
[smoove] 2023/08/15 19:26:42 writing sorted, indexed file to to/NA24385-svs-cram-smoove.genotyped.vcf.gz
[smoove] 2023/08/15 19:26:42 excluding variants with all unknown or homozygous reference genotypes
[smoove] 2023/08/15 19:26:42 616        0
[smoove] 2023/08/15 19:26:42 1  1000000
[smoove] 2023/08/15 19:26:42 > gsort version 0.1.4
[smoove] 2023/08/15 19:26:51 wrote sorted, indexed file to to/NA24385-svs-cram-smoove.genotyped.vcf.gz