bcgsc / mavis

Merging, Annotation, Validation, and Illustration of Structural variants
http://mavis.bcgsc.ca
GNU General Public License v3.0
74 stars 14 forks source link

Error during MAVIS setup #246

Closed sarahdada closed 3 years ago

sarahdada commented 3 years ago

Describe the bug This bug shows the following error during setup, it might be an issue with parsing. I am running manta-1.4.0 , the data might be in the wrong format.

To Reproduce Steps to reproduce the behavior: .cfg file was made using the commands mavis config \ --library PG00038 genome normal False hg38_PG1.bam \ --library PG00037 genome normal False hg38_PG2.bam \ --library PG00048 genome diseased False hg38_PG3.bam \ --convert delly PG_dellysomatic_hg38.vcf delly \ --convert manta diploidSV.de_novo.vcf manta \ --assign PG1 delly manta \ --assign PG2 delly manta \ --assign PG3 delly manta \ -w PG0001_2_3.cfg

The problem

  1. run command 'mavis setup PG0001_2_3.cfg -o path/to/results'
  2. See error ... File "/home/user/miniconda3/bin/mavis", line 8, in sys.exit(main()) File "/home/user/miniconda3/lib/python3.8/site-packages/mavis/main.py", line 600, in main raise err File "/home/user/miniconda3/lib/python3.8/site-packages/mavis/main.py", line 582, in main pipeline = _pipeline.Pipeline.build(config) File "/home/user/miniconda3/lib/python3.8/site-packages/mavis/schedule/pipeline.py", line 350, in build libconf.inputs = run_conversion(config, libconf, conversion_dir) File "/home/user/miniconda3/lib/python3.8/site-packages/mavis/schedule/pipeline.py", line 81, in run_conversion convert_tool_output( File "/home/user/miniconda3/lib/python3.8/site-packages/mavis/tools/init.py", line 34, in convert_tool_output _convert_tool_output( File "/home/user/miniconda3/lib/python3.8/site-packages/mavis/tools/init.py", line 276, in _convert_tool_output rows = read_vcf(input_file, file_type, log) File "/home/user/miniconda3/lib/python3.8/site-packages/mavis/tools/vcf.py", line 197, in convert_file raise err File "/home/user/miniconda3/lib/python3.8/site-packages/mavis/tools/vcf.py", line 194, in convert_file rows.extend(convert_record(vcf_record, log=log)) File "/home/user/miniconda3/lib/python3.8/site-packages/mavis/tools/vcf.py", line 110, in convert_record chr2, end, orient1, orient2, ref, alt = parse_bnd_alt(alt) File "/home/user/miniconda3/lib/python3.8/site-packages/mavis/tools/vcf.py", line 74, in parse_bnd_alt raise NotImplementedError('alt specification in unexpected format', alt) NotImplementedError: ('alt specification in unexpected format', ']chr19:27302623]')

Expected behavior Consolidation of SV data

Versions (please complete the following information):

Additional context Add any other context about the problem here.

creisle commented 3 years ago

@sarahdada could you include the vcf header and variant line in question? Which file did the error occur on (should have a log message for which one it was converting above the error)

To get the relevant lines you can use grep

grep 'chr19:27302623' diploidSV.de_novo.vcf

and the header

grep '^##' diploidSV.de_novo.vcf

Make sure to remove any identifiers before posting (library names, patient names, etc)

sarahdada commented 3 years ago

For grep chr19, it doesn't seem to come up with anything- so maybe the SV caller format is off


##fileformat=VCFv4.1
##fileDate=20210415
##source=GenerateSVCandidates 1.4.0
##reference=file:///projects/file/hg38_no_alt.fa
##contig=<ID=chr1,length=248956422>
##contig=<ID=chr2,length=242193529>
##contig=<ID=chr3,length=198295559>
##contig=<ID=chr4,length=190214555>
##contig=<ID=chr5,length=181538259>
##contig=<ID=chr6,length=170805979>
##contig=<ID=chr7,length=159345973>
##contig=<ID=chr8,length=145138636>
##contig=<ID=chr9,length=138394717>
##contig=<ID=chr10,length=133797422>
##contig=<ID=chr11,length=135086622>
##contig=<ID=chr12,length=133275309>
##contig=<ID=chr13,length=114364328>
##contig=<ID=chr14,length=107043718>
##contig=<ID=chr15,length=101991189>
##contig=<ID=chr16,length=90338345>
##contig=<ID=chr17,length=83257441>
##contig=<ID=chr18,length=80373285>
##contig=<ID=chr19,length=58617616>
##contig=<ID=chr20,length=64444167>
##contig=<ID=chr21,length=46709983>
##contig=<ID=chr22,length=50818468>
##contig=<ID=chrX,length=156040895>
##contig=<ID=chrY,length=57227415>
##contig=<ID=chrM,length=16569>
##contig=<ID=chr1_KI270706v1_random,length=175055>
##contig=<ID=chr1_KI270707v1_random,length=32032>
##contig=<ID=chr1_KI270708v1_random,length=127682>
##contig=<ID=chr1_KI270709v1_random,length=66860>
##contig=<ID=chr1_KI270710v1_random,length=40176>
##contig=<ID=chr1_KI270711v1_random,length=42210>
##contig=<ID=chr1_KI270712v1_random,length=176043>
##contig=<ID=chr1_KI270713v1_random,length=40745>
##contig=<ID=chr1_KI270714v1_random,length=41717>
##contig=<ID=chr2_KI270715v1_random,length=161471>
##contig=<ID=chr2_KI270716v1_random,length=153799>
##contig=<ID=chr3_GL000221v1_random,length=155397>
##contig=<ID=chr4_GL000008v2_random,length=209709>
##contig=<ID=chr5_GL000208v1_random,length=92689>
##contig=<ID=chr9_KI270717v1_random,length=40062>
##contig=<ID=chr9_KI270718v1_random,length=38054>
##contig=<ID=chr9_KI270719v1_random,length=176845>
##contig=<ID=chr9_KI270720v1_random,length=39050>
##contig=<ID=chr11_KI270721v1_random,length=100316>
##contig=<ID=chr14_GL000009v2_random,length=201709>
##contig=<ID=chr14_GL000225v1_random,length=211173>
##contig=<ID=chr14_KI270722v1_random,length=194050>
##contig=<ID=chr14_GL000194v1_random,length=191469>
##contig=<ID=chr14_KI270723v1_random,length=38115>
##contig=<ID=chr14_KI270724v1_random,length=39555>
##contig=<ID=chr14_KI270725v1_random,length=172810>
##contig=<ID=chr14_KI270726v1_random,length=43739>
##contig=<ID=chr15_KI270727v1_random,length=448248>
##contig=<ID=chr16_KI270728v1_random,length=1872759>
##contig=<ID=chr17_GL000205v2_random,length=185591>
##contig=<ID=chr17_KI270729v1_random,length=280839>
##contig=<ID=chr17_KI270730v1_random,length=112551>
##contig=<ID=chr22_KI270731v1_random,length=150754>
##contig=<ID=chr22_KI270732v1_random,length=41543>
##contig=<ID=chr22_KI270733v1_random,length=179772>
##contig=<ID=chr22_KI270734v1_random,length=165050>
##contig=<ID=chr22_KI270735v1_random,length=42811>
##contig=<ID=chr22_KI270736v1_random,length=181920>
##contig=<ID=chr22_KI270737v1_random,length=103838>
##contig=<ID=chr22_KI270738v1_random,length=99375>
##contig=<ID=chr22_KI270739v1_random,length=73985>
##contig=<ID=chrY_KI270740v1_random,length=37240>
##contig=<ID=chrUn_KI270302v1,length=2274>
##contig=<ID=chrUn_KI270304v1,length=2165>
##contig=<ID=chrUn_KI270303v1,length=1942>
##contig=<ID=chrUn_KI270305v1,length=1472>
##contig=<ID=chrUn_KI270322v1,length=21476>
##contig=<ID=chrUn_KI270320v1,length=4416>
##contig=<ID=chrUn_KI270310v1,length=1201>
##contig=<ID=chrUn_KI270316v1,length=1444>
##contig=<ID=chrUn_KI270315v1,length=2276>
##contig=<ID=chrUn_KI270312v1,length=998>
##contig=<ID=chrUn_KI270311v1,length=12399>
##contig=<ID=chrUn_KI270317v1,length=37690>
##contig=<ID=chrUn_KI270412v1,length=1179>
##contig=<ID=chrUn_KI270411v1,length=2646>
##contig=<ID=chrUn_KI270414v1,length=2489>
##contig=<ID=chrUn_KI270419v1,length=1029>
##contig=<ID=chrUn_KI270418v1,length=2145>
##contig=<ID=chrUn_KI270420v1,length=2321>
##contig=<ID=chrUn_KI270424v1,length=2140>
##contig=<ID=chrUn_KI270417v1,length=2043>
##contig=<ID=chrUn_KI270422v1,length=1445>
##contig=<ID=chrUn_KI270423v1,length=981>
##contig=<ID=chrUn_KI270425v1,length=1884>
##contig=<ID=chrUn_KI270429v1,length=1361>
##contig=<ID=chrUn_KI270442v1,length=392061>
##contig=<ID=chrUn_KI270466v1,length=1233>
##contig=<ID=chrUn_KI270465v1,length=1774>
##contig=<ID=chrUn_KI270467v1,length=3920>
##contig=<ID=chrUn_KI270435v1,length=92983>
##contig=<ID=chrUn_KI270438v1,length=112505>
##contig=<ID=chrUn_KI270468v1,length=4055>
##contig=<ID=chrUn_KI270510v1,length=2415>
##contig=<ID=chrUn_KI270509v1,length=2318>
##contig=<ID=chrUn_KI270518v1,length=2186>
##contig=<ID=chrUn_KI270508v1,length=1951>
##contig=<ID=chrUn_KI270516v1,length=1300>
##contig=<ID=chrUn_KI270512v1,length=22689>
##contig=<ID=chrUn_KI270519v1,length=138126>
##contig=<ID=chrUn_KI270522v1,length=5674>
##contig=<ID=chrUn_KI270511v1,length=8127>
##contig=<ID=chrUn_KI270515v1,length=6361>
##contig=<ID=chrUn_KI270507v1,length=5353>
##contig=<ID=chrUn_KI270517v1,length=3253>
##contig=<ID=chrUn_KI270529v1,length=1899>
##contig=<ID=chrUn_KI270528v1,length=2983>
##contig=<ID=chrUn_KI270530v1,length=2168>
##contig=<ID=chrUn_KI270539v1,length=993>
##contig=<ID=chrUn_KI270538v1,length=91309>
##contig=<ID=chrUn_KI270544v1,length=1202>
##contig=<ID=chrUn_KI270548v1,length=1599>
##contig=<ID=chrUn_KI270583v1,length=1400>
##contig=<ID=chrUn_KI270587v1,length=2969>
##contig=<ID=chrUn_KI270580v1,length=1553>
##contig=<ID=chrUn_KI270581v1,length=7046>
##contig=<ID=chrUn_KI270579v1,length=31033>
##contig=<ID=chrUn_KI270589v1,length=44474>
##contig=<ID=chrUn_KI270590v1,length=4685>
##contig=<ID=chrUn_KI270584v1,length=4513>
##contig=<ID=chrUn_KI270582v1,length=6504>
##contig=<ID=chrUn_KI270588v1,length=6158>
##contig=<ID=chrUn_KI270593v1,length=3041>
##contig=<ID=chrUn_KI270591v1,length=5796>
##contig=<ID=chrUn_KI270330v1,length=1652>
##contig=<ID=chrUn_KI270329v1,length=1040>
##contig=<ID=chrUn_KI270334v1,length=1368>
##contig=<ID=chrUn_KI270333v1,length=2699>
##contig=<ID=chrUn_KI270335v1,length=1048>
##contig=<ID=chrUn_KI270338v1,length=1428>
##contig=<ID=chrUn_KI270340v1,length=1428>
##contig=<ID=chrUn_KI270336v1,length=1026>
##contig=<ID=chrUn_KI270337v1,length=1121>
##contig=<ID=chrUn_KI270363v1,length=1803>
##contig=<ID=chrUn_KI270364v1,length=2855>
##contig=<ID=chrUn_KI270362v1,length=3530>
##contig=<ID=chrUn_KI270366v1,length=8320>
##contig=<ID=chrUn_KI270378v1,length=1048>
##contig=<ID=chrUn_KI270379v1,length=1045>
##contig=<ID=chrUn_KI270389v1,length=1298>
##contig=<ID=chrUn_KI270390v1,length=2387>
##contig=<ID=chrUn_KI270387v1,length=1537>
##contig=<ID=chrUn_KI270395v1,length=1143>
##contig=<ID=chrUn_KI270396v1,length=1880>
##contig=<ID=chrUn_KI270388v1,length=1216>
##contig=<ID=chrUn_KI270394v1,length=970>
##contig=<ID=chrUn_KI270386v1,length=1788>
##contig=<ID=chrUn_KI270391v1,length=1484>
##contig=<ID=chrUn_KI270383v1,length=1750>
##contig=<ID=chrUn_KI270393v1,length=1308>
##contig=<ID=chrUn_KI270384v1,length=1658>
##contig=<ID=chrUn_KI270392v1,length=971>
##contig=<ID=chrUn_KI270381v1,length=1930>
##contig=<ID=chrUn_KI270385v1,length=990>
##contig=<ID=chrUn_KI270382v1,length=4215>
##contig=<ID=chrUn_KI270376v1,length=1136>
##contig=<ID=chrUn_KI270374v1,length=2656>
##contig=<ID=chrUn_KI270372v1,length=1650>
##contig=<ID=chrUn_KI270373v1,length=1451>
##contig=<ID=chrUn_KI270375v1,length=2378>
##contig=<ID=chrUn_KI270371v1,length=2805>
##contig=<ID=chrUn_KI270448v1,length=7992>
##contig=<ID=chrUn_KI270521v1,length=7642>
##contig=<ID=chrUn_GL000195v1,length=182896>
##contig=<ID=chrUn_GL000219v1,length=179198>
##contig=<ID=chrUn_GL000220v1,length=161802>
##contig=<ID=chrUn_GL000224v1,length=179693>
##contig=<ID=chrUn_KI270741v1,length=157432>
##contig=<ID=chrUn_GL000226v1,length=15008>
##contig=<ID=chrUn_GL000213v1,length=164239>
##contig=<ID=chrUn_KI270743v1,length=210658>
##contig=<ID=chrUn_KI270744v1,length=168472>
##contig=<ID=chrUn_KI270745v1,length=41891>
##contig=<ID=chrUn_KI270746v1,length=66486>
##contig=<ID=chrUn_KI270747v1,length=198735>
##contig=<ID=chrUn_KI270748v1,length=93321>
##contig=<ID=chrUn_KI270749v1,length=158759>
##contig=<ID=chrUn_KI270750v1,length=148850>
##contig=<ID=chrUn_KI270751v1,length=150742>
##contig=<ID=chrUn_KI270752v1,length=27745>
##contig=<ID=chrUn_KI270753v1,length=62944>
##contig=<ID=chrUn_KI270754v1,length=40191>
##contig=<ID=chrUn_KI270755v1,length=36723>
##contig=<ID=chrUn_KI270756v1,length=79590>
##contig=<ID=chrUn_KI270757v1,length=71251>
##contig=<ID=chrUn_GL000214v1,length=137718>
##contig=<ID=chrUn_KI270742v1,length=186739>
##contig=<ID=chrUn_GL000216v2,length=176608>
##contig=<ID=chrUn_GL000218v1,length=161147>
##contig=<ID=chrEBV,length=171823>
##INFO=<ID=IMPRECISE,Number=0,Type=Flag,Description="Imprecise structural variation">
##INFO=<ID=SVTYPE,Number=1,Type=String,Description="Type of structural variant">
##INFO=<ID=SVLEN,Number=.,Type=Integer,Description="Difference in length between REF and ALT alleles">
##INFO=<ID=END,Number=1,Type=Integer,Description="End position of the variant described in this record">
##INFO=<ID=CIPOS,Number=2,Type=Integer,Description="Confidence interval around POS">
##INFO=<ID=CIEND,Number=2,Type=Integer,Description="Confidence interval around END">
##INFO=<ID=CIGAR,Number=A,Type=String,Description="CIGAR alignment for each alternate indel allele">
##INFO=<ID=MATEID,Number=.,Type=String,Description="ID of mate breakend">
##INFO=<ID=EVENT,Number=1,Type=String,Description="ID of event associated to breakend">
##INFO=<ID=HOMLEN,Number=.,Type=Integer,Description="Length of base pair identical homology at event breakpoints">
##INFO=<ID=HOMSEQ,Number=.,Type=String,Description="Sequence of base pair identical homology at event breakpoints">
##INFO=<ID=SVINSLEN,Number=.,Type=Integer,Description="Length of insertion">
##INFO=<ID=SVINSSEQ,Number=.,Type=String,Description="Sequence of insertion">
##INFO=<ID=LEFT_SVINSSEQ,Number=.,Type=String,Description="Known left side of insertion for an insertion of unknown length">
##INFO=<ID=RIGHT_SVINSSEQ,Number=.,Type=String,Description="Known right side of insertion for an insertion of unknown length">
##INFO=<ID=INV3,Number=0,Type=Flag,Description="Inversion breakends open 3' of reported location">
##INFO=<ID=INV5,Number=0,Type=Flag,Description="Inversion breakends open 5' of reported location">
##INFO=<ID=BND_DEPTH,Number=1,Type=Integer,Description="Read depth at local translocation breakend">
##INFO=<ID=MATE_BND_DEPTH,Number=1,Type=Integer,Description="Read depth at remote translocation mate breakend">
##INFO=<ID=JUNCTION_QUAL,Number=1,Type=Integer,Description="If the SV junction is part of an EVENT (ie. a multi-adjacency variant), this field provides the QUAL value for the adjacency in question only">
##FORMAT=<ID=DQ,Number=1,Type=Integer,Description="De novo quality score">
##FORMAT=<ID=GT,Number=1,Type=String,Description="Genotype">
##FORMAT=<ID=FT,Number=1,Type=String,Description="Sample filter, 'PASS' indicates that all filters have passed for this sample">
##FORMAT=<ID=GQ,Number=1,Type=Integer,Description="Genotype Quality">
##FORMAT=<ID=PL,Number=G,Type=Integer,Description="Normalized, Phred-scaled likelihoods for genotypes as defined in the VCF specification">
##FORMAT=<ID=PR,Number=.,Type=Integer,Description="Spanning paired-read support for the ref and alt alleles in the order listed">
##FORMAT=<ID=SR,Number=.,Type=Integer,Description="Split reads for the ref and alt alleles in the order listed, for reads where P(allele|read)>0.999">
##FILTER=<ID=Ploidy,Description="For DEL & DUP variants, the genotypes of overlapping variants (with similar size) are inconsistent with diploid expectation">
##FILTER=<ID=MaxDepth,Description="Depth is greater than 3x the median chromosome depth near one or both variant breakends">
##FILTER=<ID=MaxMQ0Frac,Description="For a small variant (<1000 bases), the fraction of reads in all samples with MAPQ0 around either breakend exceeds 0.4">
##FILTER=<ID=NoPairSupport,Description="For variants significantly larger than the paired read fragment size, no paired reads support the alternate allele in any sample.">
##FILTER=<ID=MinQUAL,Description="QUAL score is less than 20">
##FILTER=<ID=MinGQ,Description="GQ score is less than 15 (filter applied at sample level and record level if all samples are filtered)">
##ALT=<ID=INV,Description="Inversion">
##ALT=<ID=DEL,Description="Deletion">
##ALT=<ID=INS,Description="Insertion">
##ALT=<ID=DUP:TANDEM,Description="Tandem Duplication">
##cmdline=/gsc/software/linux-x86_64-centos6/manta-1.4.0/bin/configManta.py --bam /projects/bam /projects/path//file1.bam --bam /projects/path/file2.bam --bam /projects/bam /projects/path/file3.bam --referenceFasta /projects/path/hg38_no_alt.fa --runDir /projects/path/MantaResults_withHeader/ ```
sarahdada commented 3 years ago

The error comes when:

(base) [sdada@gphost14 MAVIS]$ mavis setup file.cfg -o /path/MAVIS MAVIS: 2.2.9 hostname: gphost14.bcgsc.ca [2021-04-28 14:39:13] arguments command = 'setup' config = '/projects/path/file.cfg' log = None log_level = 'INFO' output = '/projects/path/MAVIS' skip_stage = [] creating output directory: '/projects/path/MAVIS/converted_inputs' setting up the directory structure for PG0003838 as /projects/path/MAVIS/PG0003_normal_genome converting input command: ['convert_tool_output', '/projects/filepath/file1.vcf', 'delly', False] reading: /projects/file1.vcf

Header for DELLY

(base) [sdada@gphost14 hg38_PG0003514_3838_4859_fastq_SV]$ grep '^#' file.vcf

fileformat=VCFv4.2

FILTER=

fileDate=20210412

ALT=

ALT=

ALT=

ALT=

ALT=

FILTER=

INFO=

INFO=

INFO=

INFO=

INFO=

INFO=

INFO=

INFO=

INFO=

INFO=

INFO=

INFO=

INFO=

INFO=

INFO=

INFO=

INFO=

INFO=

INFO=

INFO=

FORMAT=

FORMAT=

FORMAT=

FORMAT=

FORMAT=

FORMAT=

FORMAT=

FORMAT=

FORMAT=

FORMAT=

FORMAT=

FORMAT=

reference=/projects/sdada_prj/sdada_scratch/MSSNG/yingsfiles/hg38_no_alt.fa

contig=

contig=

contig=

contig=

contig=

contig=

contig=

contig=

contig=

contig=

contig=

contig=

contig=

contig=

contig=

contig=

contig=

contig=

contig=

contig=

contig=

contig=

contig=

contig=

contig=

contig=

contig=

contig=

contig=

contig=

contig=

contig=

contig=

contig=

contig=

contig=

contig=

contig=

contig=

contig=

contig=

contig=

contig=

contig=

contig=

contig=

contig=

contig=

contig=

contig=

contig=

contig=

contig=

contig=

contig=

contig=

contig=

contig=

contig=

contig=

contig=

contig=

contig=

contig=

contig=

contig=

contig=

contig=

contig=

contig=

contig=

contig=

contig=

contig=

contig=

contig=

contig=

contig=

contig=

contig=

contig=

contig=

contig=

contig=

contig=

contig=

contig=

contig=

contig=

contig=

contig=

contig=

contig=

contig=

contig=

contig=

contig=

contig=

contig=

contig=

contig=

contig=

contig=

contig=

contig=

contig=

contig=

contig=

contig=

contig=

contig=

contig=

contig=

contig=

contig=

contig=

contig=

contig=

contig=

contig=

contig=

contig=

contig=

contig=

contig=

contig=

contig=

contig=

contig=

contig=

contig=

contig=

contig=

contig=

contig=

contig=

contig=

contig=

contig=

contig=

contig=

contig=

contig=

contig=

contig=

contig=

contig=

contig=

contig=

contig=

contig=

contig=

contig=

contig=

contig=

contig=

contig=

contig=

contig=

contig=

contig=

contig=

contig=

contig=

contig=

contig=

contig=

contig=

contig=

contig=

contig=

contig=

contig=

contig=

contig=

contig=

contig=

contig=

contig=

contig=

contig=

contig=

contig=

contig=

contig=

contig=

contig=

contig=

contig=

contig=

contig=

contig=

contig=

contig=

contig=

bcftools_viewVersion=1.11+htslib-1.11

bcftools_viewCommand=view -Ov PG0004859-BLD_PG0003514compared_dellysomatic_hg38.bcf; Date=Thu Apr 22 11:38:25 2021

CHROM POS ID REF ALT QUAL FILTER INFO FORMAT PG0004859-BLD_hg38_marked_dup PG0003514-BLD_hg38_marked_dup

DELLY vcf grep chr19 does get a result. I can't scroll up on error but can just quickly rerun


chrUn_KI270516v1        2       BND00073768     N       ]chr19:27302623]        67      LowQual PRECISE;SVTYPE=BND;SVMETHOD=EMBL.DELLYv0.8.7;END=3;CHR2=chr19;POS2=27302623;PE=0;MAPQ=0;CT=5to3;CIPOS=-1,1;CIEND=-1,1;SRMAPQ=14;INSLEN=0;HOMLEN=0;SR=4;SRQ=0.951613;CONSENSUS=CAATTTGGAGAGTTTTGAGGCCTATTGTGGAAAGATATATCCTAAAATAAAAAATACACGGAAGCATTCTGAGAAACTTCATTGTTTTGTGTGCATTCAACTCACAGAGTTGAACCTATCT;CE=1.9285GT:GL:GQ:FT:RCL:RC:RCR:RDCN:DR:DV:RR:RV  0/1:-129.223,0,-44.7728:10000:PASS:0:110747:110747:2:0:0:19:59  0/1:-107.543,0,-42.0154:10000:PASS:0:95614:95614:2:0:0:16:41

(base) [sdada@gphost14 DellyResults_withHeader]$ grep 'chr19:27302623' dellyfile2.vcf
chrUn_KI270516v1        2       BND00069701     N       ]chr19:27302623]        67      LowQual PRECISE;SVTYPE=BND;SVMETHOD=EMBL.DELLYv0.8.7;END=3;CHR2=chr19;POS2=27302623;PE=0;MAPQ=0;CT=5to3;CIPOS=-1,1;CIEND=-1,1;SRMAPQ=14;INSLEN=0;HOMLEN=0;SR=4;SRQ=0.951613;CONSENSUS=CAATTTGGAGAGTTTTGAGGCCTATTGTGGAAAGATATATCCTAAAATAAAAAATACACGGAAGCATTCTGAGAAACTTCATTGTTTTGTGTGCATTCAACTCACAGAGTTGAACCTATCT;CE=1.9285GT:GL:GQ:FT:RCL:RC:RCR:RDCN:DR:DV:RR:RV  0/1:-129.223,0,-44.7728:10000:PASS:0:110747:110747:2:0:0:19:59  0/1:-94.3442,0,-45.4931:10000:PASS:0:88163:88163:2:0:0:17:39 ```
creisle commented 3 years ago

ok so I actually might know what's going on here. It looks like the DELLY is using an alt format that is close but doesn't match the vcf 4.2 specification (despite the header)

r = reference base/seq
u = untemplated sequence/alternate sequence
p = chromosome:position

They have ]p] but we expect that to include sequences ]p]ur The alternate sequence can be empty but the reference sequence we don't consider optional. @calchoo can you double check the VCF 4.2 format for this make sure I'm interpreting it correctly

creisle commented 3 years ago

What version of delly are you using?

sarahdada commented 3 years ago

/gsc/software/linux-x86_64-centos7/delly-0.8.7/bin/delly is what i am using thanks cara!!

sarahdada commented 3 years ago

I used delly-0.8.1 , am clearly making an error here


                      MAVIS: 2.2.9
                      hostname: gphost14.bcgsc.ca
[2021-04-29 16:30:22] arguments
                        command = 'setup'
                        config = '/projects/pathMAVIS/file_both.cfg'
                        log = None
                        log_level = 'INFO'
                        output = '/projects/path/MAVIS'
                        skip_stage = []
Traceback (most recent call last):
  File "/home/sdada/miniconda3/lib/python3.8/site-packages/mavis/util.py", line 65, in filepath
    file_list = bash_expands(path)
  File "/home/sdada/miniconda3/lib/python3.8/site-packages/mavis/util.py", line 158, in bash_expands
    raise FileNotFoundError('The expression does not match any files', expression)
FileNotFoundError: [Errno The expression does not match any files] None

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/home/sdada/miniconda3/bin/mavis", line 8, in <module>
    sys.exit(main())
  File "/home/sdada/miniconda3/lib/python3.8/site-packages/mavis/main.py", line 482, in main
    config = _config.MavisConfig.read(args.config)
  File "/home/sdada/miniconda3/lib/python3.8/site-packages/mavis/config.py", line 379, in read
    return MavisConfig(**config_dict)
  File "/home/sdada/miniconda3/lib/python3.8/site-packages/mavis/config.py", line 264, in __init__
    raise err
  File "/home/sdada/miniconda3/lib/python3.8/site-packages/mavis/config.py", line 257, in __init__
    self[sec] = validate_section(kwargs.pop(sec, {}), defaults, True)
  File "/home/sdada/miniconda3/lib/python3.8/site-packages/mavis/config.py", line 466, in validate_section
    value = cast_type(value)
  File "/home/sdada/miniconda3/lib/python3.8/site-packages/mavis/util.py", line 67, in filepath
    raise TypeError('File does not exist', path)
TypeError: Error in validating the reference section in the config. File does not exist None ```
creisle commented 3 years ago

TypeError: Error in validating the reference section in the config. File does not exist None ```

This sounds like you didn't add a reference file you need? or you set it to None. What does the reference section of your mavis config look like?

sarahdada commented 3 years ago

Hi Cara,

(Un)Shockingly you are right; I didnt have the right masking etc for my ref. Changed Ref. THANKS

sarahdada commented 3 years ago

Still looking like chrm 8. Really sorry if I'm messing up something w/delly
(base) [sdada@gphost14 MAVIS]$ mavis setup file.cfg -o /projects/path/MAVIS
                      MAVIS: 2.2.9
                      hostname: gphost14.bcgsc.ca
[2021-04-30 11:09:34] arguments
                        command = 'setup'
                        config = '/projects/path/MAVIS/file.cfg'
                        log = None
                        log_level = 'INFO'
                        output = '/projects/path/MAVIS'
                        skip_stage = []
                      creating output directory: '/projects/path/MAVIS/converted_inputs'
                      setting up the directory structure for PG0003838 as /projects/path/MAVIS/PG0003838_normal_genome
                      converting input command: ['convert_tool_output', '/projects/path/DELLY_0.8.1/patientcompared_dellysomatic_v81_hg38header.vcf', 'delly', False]
                      reading: /projects/path/DELLY_0.8.1/patientcompared_dellysomatic_v81_hg38header.vcf
Traceback (most recent call last):
  File "/home/sdada/miniconda3/bin/mavis", line 8, in <module>
    sys.exit(main())
  File "/home/sdada/miniconda3/lib/python3.8/site-packages/mavis/main.py", line 600, in main
    raise err
  File "/home/sdada/miniconda3/lib/python3.8/site-packages/mavis/main.py", line 582, in main
    pipeline = _pipeline.Pipeline.build(config)
  File "/home/sdada/miniconda3/lib/python3.8/site-packages/mavis/schedule/pipeline.py", line 350, in build
    libconf.inputs = run_conversion(config, libconf, conversion_dir)
  File "/home/sdada/miniconda3/lib/python3.8/site-packages/mavis/schedule/pipeline.py", line 81, in run_conversion
    convert_tool_output(
  File "/home/sdada/miniconda3/lib/python3.8/site-packages/mavis/tools/__init__.py", line 34, in convert_tool_output
    _convert_tool_output(
  File "/home/sdada/miniconda3/lib/python3.8/site-packages/mavis/tools/__init__.py", line 276, in _convert_tool_output
    rows = read_vcf(input_file, file_type, log)
  File "/home/sdada/miniconda3/lib/python3.8/site-packages/mavis/tools/vcf.py", line 197, in convert_file
    raise err
  File "/home/sdada/miniconda3/lib/python3.8/site-packages/mavis/tools/vcf.py", line 194, in convert_file
    rows.extend(convert_record(vcf_record, log=log))
  File "/home/sdada/miniconda3/lib/python3.8/site-packages/mavis/tools/vcf.py", line 110, in convert_record
    chr2, end, orient1, orient2, ref, alt = parse_bnd_alt(alt)
  File "/home/sdada/miniconda3/lib/python3.8/site-packages/mavis/tools/vcf.py", line 74, in parse_bnd_alt
    raise NotImplementedError('alt specification in unexpected format', alt)
NotImplementedError: ('alt specification in unexpected format', ']chr8:43240816]') ```
sarahdada commented 3 years ago

 hello friends and fam

it seems like its an issue with my VCF (note bottom chr with 'N' in chromosome has a weird pattern and is missing a letter). going to re-bcf >vcf it, and if that wont work I am going to regenerate the delly bcf. the OTHER delly vcf works, which is a nice control. If none of this works i'll manually try and mess with it 

(base) [sdada@gphost14 DELLY_0.8.1]$ grep -B 10 'chr8:43240816' file_dellysomatic_v81_hg38header.vcf
chrUn_KI270336v1        2       BND00064084     A       [chrUn_KI270467v1:711[A .       PASS    IMPRECISE;SVTYPE=BND;SVMETHOD=EMBL.DELLYv0.8.1;CHR2=chrUn_KI270467v1;END=711;PE=11;MAPQ=23;CT=5to5;CIPOS=-51,51;CIEND=-51,51  GT:GL:GQ:FT:RCL:RC:RCR:CN:DR:DV:RR:RV   0/1:-114.835,0,-264.594:10000:PASS:0:522:234:4:52:108:0:0       0/1:-120.162,0,-238.566:10000:PASS:0:1665:388:9:46:88:0:0
chrUn_KI270336v1        284     BND00064085     T       T]chrUn_KI270467v1:2612]        .       LowQual IMPRECISE;SVTYPE=BND;SVMETHOD=EMBL.DELLYv0.8.1;CHR2=chrUn_KI270467v1;END=2612;PE=2;MAPQ=24;CT=3to3;CIPOS=-395,395;CIEND=-395,395      GT:GL:GQ:FT:RCL:RC:RCR:CN:DR:DV:RR:RV   0/0:0,-1000,-1000:10000:PASS:472:759:25470:0:6944:385:0:0       0/1:-1000,0,-1000:10000:PASS:1314:1830:29355:0:3639:1227:0:0
chrUn_KI270336v1        315     DEL00064086     A       <DEL>   .       LowQual IMPRECISE;SVTYPE=DEL;SVMETHOD=EMBL.DELLYv0.8.1;CHR2=chrUn_KI270336v1;END=831;PE=2;MAPQ=6;CT=3to5;CIPOS=-215,215;CIEND=-215,215        GT:GL:GQ:FT:RCL:RC:RCR:CN:DR:DV:RR:RV   0/0:0,-47.4989,-383.219:10000:PASS:497:297:1161:0:175:0:0:0     0/0:0,-56.4709,-532.986:10000:PASS:1529:451:1669:0:203:1:0:0
chrUn_KI270336v1        440     BND00064087     C       C]chrUn_KI270466v1:1097]        .       LowQual IMPRECISE;SVTYPE=BND;SVMETHOD=EMBL.DELLYv0.8.1;CHR2=chrUn_KI270466v1;END=1097;PE=3;MAPQ=21;CT=3to3;CIPOS=-50,50;CIEND=-50,50  GT:GL:GQ:FT:RCL:RC:RCR:CN:DR:DV:RR:RV   0/1:-30.5308,0,-1000:10000:PASS:514:1810:4731:1:664:121:0:0     0/1:-796.708,0,-1000:10000:PASS:1586:3228:7851:1:1473:401:0:0
chrUn_KI270336v1        444     BND00064088     T       T]chrUn_KI270467v1:1485]        .       LowQual IMPRECISE;SVTYPE=BND;SVMETHOD=EMBL.DELLYv0.8.1;CHR2=chrUn_KI270467v1;END=1485;PE=2;MAPQ=21;CT=3to3;CIPOS=-50,50;CIEND=-50,50  GT:GL:GQ:FT:RCL:RC:RCR:CN:DR:DV:RR:RV   0/1:-103.653,0,-399.956:10000:PASS:514:1821:1128:2:143:116:0:0  0/1:-536.932,0,-484.266:10000:PASS:1609:3349:2367:2:205:424:0:0
chrUn_KI270336v1        781     BND00064089     G       [chr3:93470364[G        .       LowQual PRECISE;SVTYPE=BND;SVMETHOD=EMBL.DELLYv0.8.1;CHR2=chr3;END=93470364;PE=0;MAPQ=0;CT=5to5;CIPOS=-4,4;CIEND=-4,4;SRMAPQ=12;INSLEN=0;HOMLEN=4;SR=5;SRQ=0.952;CONSENSUS=TGATATTTTTTGTACAGTATAGAATATATACTTTGGGTATTTTGATATTTTATGTACAGTATACAATGTATGGTTTCTGAACTTTGATATTTCATGTAGAGTATAAAATATATATTTGGGGTACA;CE=1.73818 GT:GL:GQ:FT:RCL:RC:RCR:CN:DR:DV:RR:RV   1/1:-1000,-199.238,0:10000:PASS:299:1496:158359:0:0:0:32:923    1/1:-1000,-65.3614,0:10000:PASS:568:2387:152387:0:0:0:19:346
chrUn_KI270336v1        861     BND00064090     T       T]chr3:93470801]        .       PASS    IMPRECISE;SVTYPE=BND;SVMETHOD=EMBL.DELLYv0.8.1;CHR2=chr3;END=93470801;PE=37;MAPQ=27;CT=3to3;CIPOS=-50,50;CIEND=-50,50 GT:GL:GQ:FT:RCL:RC:RCR:CN:DR:DV:RR:RV   0/1:-1000,0,-195.757:10000:PASS:447:1447:49:6:135:400:0:0       1/1:-1000,-524.548,0:10000:PASS:672:2094:54:6:175:3134:0:0
chrUn_KI270336v1        893     BND00064091     C       C[chr4:51107269[        .       LowQual IMPRECISE;SVTYPE=BND;SVMETHOD=EMBL.DELLYv0.8.1;CHR2=chr4;END=51107269;PE=4;MAPQ=23;CT=3to5;CIPOS=-469,469;CIEND=-469,469      GT:GL:GQ:FT:RCL:RC:RCR:CN:DR:DV:RR:RV   0/1:-81.404,0,-261.164:10000:PASS:646:1446:8:4:137:137:0:0      0/1:-42.8954,0,-334.227:10000:PASS:847:2086:21:5:187:80:0:0
chrUn_KI270336v1        909     BND00064092     A       [chr3:93470362[A        .       LowQual PRECISE;SVTYPE=BND;SVMETHOD=EMBL.DELLYv0.8.1;CHR2=chr3;END=93470362;PE=0;MAPQ=0;CT=5to5;CIPOS=-9,9;CIEND=-9,9;SRMAPQ=9;INSLEN=0;HOMLEN=8;SR=95;SRQ=0.970414;CONSENSUS=TACAGTATAGAATATATACCTTGGGTACTTTGATATTTTATGTACAGTATATAATATATGGTTTGTGAACTTTGATATTTCATGTAGAGTATAAAATATATATTTGGGGTACATTGATATTATATGTACAGTATATAATCTATATTTGATGTACTTTCATATTTTATGT;CE=1.73195  GT:GL:GQ:FT:RCL:RC:RCR:CN:DR:DV:RR:RV   1/1:-1000,-1000,0:10000:PASS:849:1443:158359:0:0:0:521:12926    1/1:-1000,-1000,0:10000:PASS:991:2082:152386:0:0:0:423:7575
chrUn_KI270336v1        926     BND00064093     T       T]chr3:93470799]        .       PASS    PRECISE;SVTYPE=BND;SVMETHOD=EMBL.DELLYv0.8.1;CHR2=chr3;END=93470799;PE=0;MAPQ=0;CT=3to3;CIPOS=-13,13;CIEND=-13,13;SRMAPQ=32;INSLEN=0;HOMLEN=12;SR=5;SRQ=0.968504;CONSENSUS=AAATATAGATTATATACTGTACATAAAATATCAAAGTACCCCAATATATATTATATACTGTACATGAAATATCAAAGTTCACAAACTATATATTATGTACTGTACATAAAATATCAAAGTACCCA;CE=1.72472 GT:GL:GQ:FT:RCL:RC:RCR:CN:DR:DV:RR:RV   1/1:-1000,-1000,0:10000:PASS:1211:1442:49:2:0:0:175:13374       1/1:-1000,-1000,0:10000:PASS:1225:2072:54:3:0:0:183:7818
chrUn_KI270337v1        2       BND00064094     N       ]chr8:43240816] .       LowQual PRECISE;SVTYPE=BND;SVMETHOD=EMBL.DELLYv0.8.1;CHR2=chr8;END=43240816;PE=0;MAPQ=0;CT=5to3;CIPOS=-2,2;CIEND=-2,2;SRMAPQ=15;INSLEN=0;HOMLEN=1;SR=3;SRQ=1;CONSENSUS=ATTGTATACTGTACATAAAATATCAAAGTATCCAAAGTATGTATTATAAGCTGTAGATAAAATATCAAAGTACCCAAACTATATATTATATACTGTACATAAAATATGAAAGTACCCAAAGTAT;CE=1.76063      GT:GL:GQ:FT:RCL:RC:RCR:CN:DR:DV:RR:RV   0/1:-755.564,0,-50.8002:10000:PASS:0:1700:167:20:0:0:64:298     0/1:-81.8446,0,-127.519:10000:PASS:0:5742:270:43:0:0:68:51 ```
creisle commented 3 years ago

ok, looks like this is an input issue rather than a bug so I am going to remove the bug label for now. If you are able to determine whether it is a problem with DELLY or BCFtools then you can reference this issue when you create one in the corresponding repo for the bug. In the meantime if we are