bcgsc / mavis

Merging, Annotation, Validation, and Illustration of Structural variants
http://mavis.bcgsc.ca
GNU General Public License v3.0
72 stars 13 forks source link

Error from Manta input #213

Closed moldach closed 4 years ago

moldach commented 4 years ago

The mavis config steps creates a mavis.cfg for the MAVIS standard input file formats (.bam and .vcf); however, according to the the documents the Reference Input Files should be set up using Environment Variables or entered in the mavis.cfg file manually.

Set up mavis.cfg

mavis config \
    --library maddog genome normal False 470.sorted.dedupped.bam \
    --convert manta diploidSV.vcf manta \
    --assign maddog manta  \
    -w mavis.cfg

Manually include references

This is how my directory looks like

(mavis) [moldach - MAVIS-MADDOG]$ ll
total 5461660
-rw-r----- 1 moldach moldach 5352369367 Jun 23 12:32 470.sorted.dedupped.bam
-rw-r----- 1 moldach moldach     307672 Jun 23 12:32 470.sorted.dedupped.bam.bai
-rw-r----- 1 moldach moldach   25639999 Jun 23 07:57 ce11.2bit
-rw-r----- 1 moldach moldach   34147927 Jun 23 07:47 celegan2.json
-rwxr-x--- 1 moldach moldach  101957874 Jun 23 07:48 c_elegans.PRJNA13758.WS265.genomic.fa
-rw-r----- 1 moldach moldach      35325 Jun 23 14:23 diploidSV.vcf.gz
-rw-r----- 1 moldach moldach       4014 Jun 23 14:24 diploidSV.vcf.gz.tbi
-rw-r----- 1 moldach moldach          0 Jun 23 14:36 log.txt
-rw-r----- 1 moldach moldach   78248661 Jun 23 14:22 mavis_CeDNR_annotations.tab
-rw-r----- 1 moldach moldach        666 Jun 23 14:35 mavis.cfg

So I'll add the required references to mavis.cfg:

[reference]
template_metadata =
masking =
annotations = /scratch/moldach/MAVIS-MADDOG/celegan2.json
aligner_reference = /scratch/moldach/MAVIS-MADDOG/ce11.2bit
dgv_annotation = /scratch/moldach/MAVIS-MADDOG/mavis_CeDNR_annotations.tab
reference_genome = /scratch/moldach/MAVIS-MADDOG/c_elegans.PRJNA13758.WS265.genomic.fa

[maddog]
library = maddog
protocol = genome
bam_file = 470.sorted.dedupped.bam
read_length = 101
median_fragment_size = 342
stdev_fragment_size = 74
strand_specific = False
strand_determining_read = 2
disease_status = normal
inputs = manta

[convert]
assume_no_untemplated = True
manta = convert_tool_output
        diploidSV.vcf.gz
        manta
        False

Running MAVIS

I'm getting the following error:

(mavis) [moldach@cedar1 MAVIS-MADDOG]$ mavis setup mavis.cfg -o output_dir/ >> log.txt
                      MAVIS: 2.2.6
                      hostname: cedar1.cedar.computecanada.ca
[2020-06-23 14:36:41] arguments
                        command = 'setup'
                        config = '/scratch/moldach/MAVIS-MADDOG/mavis.cfg'
                        log = None
                        log_level = 'INFO'
                        output = 'output_dir/'
                        skip_stage = []
                      creating output directory: 'output_dir/converted_inputs'
                      setting up the directory structure for maddog as /scratch/moldach/MAVIS-MADDOG/output_dir/maddog_normal_genome
                      converting input command: ['convert_tool_output', '/scratch/moldach/MAVIS-MADDOG/diploidSV.vcf.gz', 'manta', False]
                      reading: /scratch/moldach/MAVIS-MADDOG/diploidSV.vcf.gz
                      found 425 rows
                      Error in converting row {'id': 'MantaBND:47:0:2:0:0:0:1', 'break2_orientation': 'L', 'untemplated_seq': '', 'break1_chromosome': 'I', 'break2_chromosome': 'I', 'break1_position_start': 1059667, 'break1_position_end': 1060026, 'break2_position_start': 1101776, 'break2_position_end': 1101776, 'event_type': 'BND', 'MATEID': 'MantaBND:47:0:2:0:0:0:0', 'IMPRECISE': True, 'BND_DEPTH': 90, 'MATE_BND_DEPTH': 41}
Traceback (most recent call last):
  File "/home/moldach/bin/mavis/bin/mavis", line 10, in <module>
    sys.exit(main())
  File "/home/moldach/bin/mavis/lib/python3.8/site-packages/mavis/main.py", line 414, in main
    raise err
  File "/home/moldach/bin/mavis/lib/python3.8/site-packages/mavis/main.py", line 397, in main
    pipeline = _pipeline.Pipeline.build(config)
  File "/home/moldach/bin/mavis/lib/python3.8/site-packages/mavis/schedule/pipeline.py", line 311, in build
    libconf.inputs = run_conversion(config, libconf, conversion_dir)
  File "/home/moldach/bin/mavis/lib/python3.8/site-packages/mavis/schedule/pipeline.py", line 75, in run_conversion
    output_tabbed_file(convert_tool_output(
  File "/home/moldach/bin/mavis/lib/python3.8/site-packages/mavis/tools.py", line 73, in convert_tool_output
    result.extend(_convert_tool_output(fname, file_type, stranded, log, assume_no_untemplated=assume_no_untemplated))
  File "/home/moldach/bin/mavis/lib/python3.8/site-packages/mavis/tools.py", line 535, in _convert_tool_output
    raise err
  File "/home/moldach/bin/mavis/lib/python3.8/site-packages/mavis/tools.py", line 532, in _convert_tool_output
    std_rows = _convert_tool_row(row, file_type, stranded, assume_no_untemplated=assume_no_untemplated)
  File "/home/moldach/bin/mavis/lib/python3.8/site-packages/mavis/tools.py", line 466, in _convert_tool_row
    raise UserWarning(
UserWarning: ('row failed to create any breakpoint pairs. This generally indicates an input formatting error', {'id': 'MantaBND:47:0:2:0:0:0:1', 'break2_orientation': 'L', 'untemplated_seq': '', 'break1_chromosome': 'I', 'break2_chromosome': 'I', 'break1_position_start': 1059667, 'break1_position_end': 1060026, 'break2_position_start': 1101776, 'break2_position_end': 1101776, 'event_type': 'BND', 'MATEID': 'MantaBND:47:0:2:0:0:0:0', 'IMPRECISE': True, 'BND_DEPTH': 90, 'MATE_BND_DEPTH': 41}, {'tracking_id': 'manta-MantaBND:47:0:2:0:0:0:1', 'break1_orientation': '?', 'break2_orientation': 'L', 'break1_strand': ['?'], 'break2_strand': ['?'], 'id': 'MantaBND:47:0:2:0:0:0:1', 'untemplated_seq': '', 'break1_chromosome': 'I', 'break2_chromosome': 'I', 'break1_position_start': 1059667, 'break1_position_end': 1060026, 'break2_position_start': 1101776, 'break2_position_end': 1101776, 'event_type': 'BND', 'MATEID': 'MantaBND:47:0:2:0:0:0:0', 'IMPRECISE': True, 'BND_DEPTH': 90, 'MATE_BND_DEPTH': 41}, [('L', 'L', '?', '?', 'translocation', True), ('L', 'L', '?', '?', 'translocation', False), ('L', 'L', '?', '?', 'inverted translocation', True), ('L', 'L', '?', '?', 'inverted translocation', False), ('R', 'L', '?', '?', 'translocation', True), ('R', 'L', '?', '?', 'translocation', False), ('R', 'L', '?', '?', 'inverted translocation', True), ('R', 'L', '?', '?', 'inverted translocation', False)])
creisle commented 4 years ago

@moldach would you be able to pull out the row from the vcf that errored along with its mate and paste it here? From just this error message it looks like the break1_orientation is missing but I can't tell more without the data itself

ramsainanduri commented 4 years ago

I have been getting the same error

My config

[reference] template_metadata = /reference_inputs/cytoBand.txt masking = /reference_inputs/hg19_masking.tab annotations = /reference_inputs/ensembl69_hg19_annotations.json aligner_reference =/reference_inputs/Genomes/Human_genome/hg19.fa dgv_annotation = /reference_inputs/reference_inputs/dgv_hg19_variants.tab reference_genome = /reference_inputs/Genomes/Human_genome/hg19.fa

[S14N] library = S14N protocol = genome bam_file = /home/ram.nanduri/SV_VCFS/Bam/S14N.recaled.bam read_length = None median_fragment_size = None stdev_fragment_size = None strand_specific = False strand_determining_read = 2 disease_status = normal inputs = manta

[S14T] library = S14T protocol = genome bam_file = /home/ram.nanduri/SV_VCFS/Bam/S14T.recaled.bam read_length = None median_fragment_size = None stdev_fragment_size = None strand_specific = False strand_determining_read = 2 disease_status = diseased inputs = manta [convert] assume_no_untemplated = True manta = convert_tool_output /home/ram.nanduri/SV_VCFS/S14N_vs_S14T/results/variants/diploidSV.vcf.gz /home/ram.nanduri/SV_VCFS/S14N_vs_S14T/results/variants/somaticSV.vcf.gz manta False

My Log MAVIS: 2.2.6 hostname: 10.1.11.37 [2020-06-30 01:19:19] arguments command = 'setup' config = '/home/ram.nanduri/SV_VCFS/MAVIS_ORI_TEST/S14N_vs_S14T.mavis.cfg' log = 'S14N_vs_S14T.Run.log' log_level = 'INFO' output = 'S14N_vs_S14T.mavis.output/' skip_stage = [ 'cluster' 'validate' ] creating output directory: 'S14N_vs_S14T.mavis.output/converted_inputs' setting up the directory structure for S14N as /home/ram.nanduri/SV_VCFS/MAVIS_ORI_TEST/S14N_vs_S14T.mavis.output/S14N_normal_genome converting input command: ['convert_tool_output', '/home/ram.nanduri/SV_VCFS/S14N_vs_S14T/results/variants/diploidSV.vcf.gz', '/home/ram.nanduri/SV_VCFS/S14N_vs_S14T/results/variants/somaticSV.vcf.gz', 'manta', False] reading: /home/ram.nanduri/SV_VCFS/S14N_vs_S14T/results/variants/diploidSV.vcf.gz found 290 rows Error in converting row {'id': 'MantaBND:207:0:1:0:0:0:0', 'break2_orientation': 'R', 'untemplated_seq': 'GCCCCAT', 'break1_chromosome': 'chr1', 'break2_chromosome': 'chr1', 'break1_position_start': 17051724, 'break1_position_end': 17051724, 'break2_position_start': 234912188, 'break2_position_end': 234912188, 'event_type': 'BND', 'MATEID': 'MantaBND:207:0:1:0:0:0:1', 'SVINSLEN': 7, 'SVINSSEQ': 'GCCCCAT', 'BND_DEPTH': 5, 'MATE_BND_DEPTH': 4}

Mate Pairs from the diploidSV.vcf.gz

chr1 17051724 MantaBND:207:0:1:0:0:0:0 C [chr1:234912188[GCCCCATC 36 PASS SVTYPE=BND;MATEID=MantaBND:207:0:1:0:0:0:1;SVINSLEN=7;SVINSSEQ=GCCCCAT;BND_DEPTH=5;MATE_BND_DEPTH=4 GT:FT:GQ:PL:PR:SR 0/1:PASS:30:86,0,28:1,2:3,1

chr1 234912188 MantaBND:207:0:1:0:0:0:1 A [chr1:17051724[ATGGGGCA 36 PASS SVTYPE=BND;MATEID=MantaBND:207:0:1:0:0:0:0;SVINSLEN=7;SVINSSEQ=ATGGGGC;BND_DEPTH=4;MATE_BND_DEPTH=5 GT:FT:GQ:PL:PR:SR 0/1:PASS:30:86,0,28:1,2:3,1

creisle commented 4 years ago

thanks @ramsainanduri! That's really helpful :) I will look into this now

ramsainanduri commented 4 years ago

Hi @creisle, Is this bug fixed and when can we expect the new version?

creisle commented 4 years ago

It is fixed and will be released in 2.2.7 which should be released today or in the next couple of days. In the mean time you can build from the fix branch here https://github.com/bcgsc/mavis/tree/bugfix/issue-213-bnd-non-trans if you like

ramsainanduri commented 4 years ago

okay thank you.

creisle commented 4 years ago

This has now been released. https://pypi.org/project/mavis/2.2.7/ Please let me know if you have further issues

moldach commented 4 years ago

This works great, thank you!