bcgsc / mavis

Merging, Annotation, Validation, and Illustration of Structural variants
http://mavis.bcgsc.ca
GNU General Public License v3.0
72 stars 13 forks source link

svict compatibility #206

Open arnavaz opened 4 years ago

arnavaz commented 4 years ago

I am submitting this issue to inquire mavis compatibility with svict SV caller. https://github.com/vpc-ccg/svict/blob/master/PUBLICATION.md

MAVIS version:

Python version:

OS:

Expected Behaviour

Actual Behaviour

Steps to Reproduce the Behaviour

creisle commented 4 years ago

Hi @arnavaz!

Thanks for making this issue. I've taken a quick look at the tool. It looks like they use a non-standard VCF output (custom SVTYPEs ex SVTYPE=INS:R) which means it may require a custom loader. We could also add it as a new type of loader to MAVIS, likely this would be a small adaption to the existing VCF loader

It's also worth pointing out we've only used MAVIS for whole genomes/transcriptomes, use on "targeted sequencing data" may have other issues. For instance one of the metrics we build in the configuration file depends on the reads in the bam files being paired and approximating a normal distribution in their fragment sizes. If you run into any issues when automatically generating your config files its likely related to that. What is the depth of coverage like in these BAMs? are they paired-end reads? what did you use to align then?

I think before we add a loader for this tool type, it would be worth manually converting a couple lines to the standard mavis format and trying to run mavis with that as input along with your BAM files. Pick a couple of SVs you expect to maintain if you can.

Does that sound doable?

arnavaz commented 4 years ago

Hi Caralyn, Thanks a lot for your response.

The reads are paired. The depth of coverage is on the order of multiple thousands and the aligner uses is bwa.

It all sounds good. I will try the standard mavis format for the inputs and test it on a couple of calls. Will get back to you on this when I get the results.

Thanks again, Arnavaz

On Sat, May 2, 2020 at 5:59 PM Caralyn Reisle notifications@github.com wrote:

Hi @arnavaz https://github.com/arnavaz!

Thanks for making this issue. I've taken a quick look at the tool. It looks like they use a non-standard VCF output which means it may require a custom loader http://mavis.bcgsc.ca/docs/latest/supported_dependencies.html#writing-a-custom-conversion-script. We could also add it as a new type of loader to MAVIS, likely this would be a small adaption to the existing VCF loader http://mavis.bcgsc.ca/docs/latest/supported_dependencies.html#general-vcf-inputs

It's also worth pointing out we've only used MAVIS for whole genomes/transcriptomes, use on "targeted sequencing data" may have other issues. For instance one of the metrics we build in the configuration file depends on the reads in the bam files being paired and approximating a normal distribution in their fragment sizes http://mavis.bcgsc.ca/docs/latest/theory.html#determining-flanking-support. If you run into any issues when automatically generating your config files its likely related to that. What is the depth of coverage like in these BAMs? are they paired-end reads? what did you use to align then?

I think before we add a loader for this tool type, it would be worth manually converting a couple lines to the standard mavis format http://mavis.bcgsc.ca/docs/latest/mavis_input.html#required-columns and trying to run mavis with that as input along with your BAM files. Pick a couple of SVs you expect to maintain if you can.

Does that sound doable?

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/bcgsc/mavis/issues/206#issuecomment-623020077, or unsubscribe https://github.com/notifications/unsubscribe-auth/ACSJSAFQ7VFJ7NJ2H3YO6DLRPSJSXANCNFSM4MHDNFYA .

arnavaz commented 4 years ago

Hi Caralyn,

Sorry to bother you again. I just need a bit of your help with mavis pipeline.

for the mavis convert command, I use the 'svict' as the user and the mavis standard format as the input. But I get an error:

My command is: mavis config \ --library LIB_test genome diseased False $BAMDIR/test_input.bam \ --convert svict /mavis_test/test_input.vcf svict --assign LIB_test svict -w $OUTDIR/inut_test/mavis.cfg

I get the error: mavis: error: argument --convert is missing input file path(s): ['svict', 'mavis_test/test_input.vcf', 'svict'] usage: mavis setup [-h] [-v] [--log LOG] [--log_level {INFO,DEBUG}] -o OUTPUT [--skip_stage {cluster,validate}] FILEPATH Is this the right way to do this? Can you help with this please?

Thanks a lot! Arna

On Sat, May 2, 2020 at 5:59 PM Caralyn Reisle notifications@github.com wrote:

Hi @arnavaz https://github.com/arnavaz!

Thanks for making this issue. I've taken a quick look at the tool. It looks like they use a non-standard VCF output which means it may require a custom loader http://mavis.bcgsc.ca/docs/latest/supported_dependencies.html#writing-a-custom-conversion-script. We could also add it as a new type of loader to MAVIS, likely this would be a small adaption to the existing VCF loader http://mavis.bcgsc.ca/docs/latest/supported_dependencies.html#general-vcf-inputs

It's also worth pointing out we've only used MAVIS for whole genomes/transcriptomes, use on "targeted sequencing data" may have other issues. For instance one of the metrics we build in the configuration file depends on the reads in the bam files being paired and approximating a normal distribution in their fragment sizes http://mavis.bcgsc.ca/docs/latest/theory.html#determining-flanking-support. If you run into any issues when automatically generating your config files its likely related to that. What is the depth of coverage like in these BAMs? are they paired-end reads? what did you use to align then?

I think before we add a loader for this tool type, it would be worth manually converting a couple lines to the standard mavis format http://mavis.bcgsc.ca/docs/latest/mavis_input.html#required-columns and trying to run mavis with that as input along with your BAM files. Pick a couple of SVs you expect to maintain if you can.

Does that sound doable?

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/bcgsc/mavis/issues/206#issuecomment-623020077, or unsubscribe https://github.com/notifications/unsubscribe-auth/ACSJSAFQ7VFJ7NJ2H3YO6DLRPSJSXANCNFSM4MHDNFYA .

creisle commented 4 years ago

The error is popping up because the argument parser thinks you mean svict as a filepath. The convert command only works with a known input type (see help menu output below)

  --convert <alias> FILEPATH [FILEPATH ...] {manta,delly,transabyss,pindel,chimerascan,mavis,defuse,breakdancer,vcf,breakseq,cnvnator,strelka,starfusion} [stranded]
                        input file conversion for internally supported tools
                        (default: [])

You can input this as type "mavis" instead

mavis config \
--library LIB_test  genome  diseased False  $BAMDIR/test_input.bam \
--convert svict  /mavis_test/test_input.vcf mavis
--assign LIB_test  svict -w $OUTDIR/inut_test/mavis.cfg

I notice from the path however it still has the .vcf extension? the mavis format is tab delimited are you passing a tab delimited file named .vcf or is this a vcf input?

arnavaz commented 4 years ago

Thanks a lot, My file is tab limited I just kept the extension as vcf. I probably should have changed the ext too. I will try again. Thanks again for your help. Arnavaz

On Mon, May 11, 2020, 8:50 PM Caralyn Reisle notifications@github.com wrote:

The error is popping up because the argument parser thinks you mean svict as a filepath. The convert command only works with a known input type (see help menu output below)

--convert FILEPATH [FILEPATH ...] {manta,delly,transabyss,pindel,chimerascan,mavis,defuse,breakdancer,vcf,breakseq,cnvnator,strelka,starfusion} [stranded] input file conversion for internally supported tools (default: [])

You can input this as type "mavis" instead

mavis config \ --library LIB_test genome diseased False $BAMDIR/test_input.bam \ --convert svict /mavis_test/test_input.vcf mavis --assign LIB_test svict -w $OUTDIR/inut_test/mavis.cfg

I notice from the path however it still has the .vcf extension? the mavis format is tab delimited are you passing a tab delimited file named .vcf or is this a vcf input?

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/bcgsc/mavis/issues/206#issuecomment-627042727, or unsubscribe https://github.com/notifications/unsubscribe-auth/ACSJSAFJTSQWVKJCFULBNTDRRCMLPANCNFSM4MHDNFYA .

arnavaz commented 4 years ago

Hi Caralyn,

I ran Mavis with the settings you suggested and it worked perfectly well. I had one validated translocation in the sample and it was outputed in the summary table. So it seems mavis is a compatible tool for svict caller. We can now use the standard format for our other samples too.

Thanks a lot, Arnavaz

On Sat, May 2, 2020 at 5:59 PM Caralyn Reisle notifications@github.com wrote:

Hi @arnavaz https://github.com/arnavaz!

Thanks for making this issue. I've taken a quick look at the tool. It looks like they use a non-standard VCF output which means it may require a custom loader http://mavis.bcgsc.ca/docs/latest/supported_dependencies.html#writing-a-custom-conversion-script. We could also add it as a new type of loader to MAVIS, likely this would be a small adaption to the existing VCF loader http://mavis.bcgsc.ca/docs/latest/supported_dependencies.html#general-vcf-inputs

It's also worth pointing out we've only used MAVIS for whole genomes/transcriptomes, use on "targeted sequencing data" may have other issues. For instance one of the metrics we build in the configuration file depends on the reads in the bam files being paired and approximating a normal distribution in their fragment sizes http://mavis.bcgsc.ca/docs/latest/theory.html#determining-flanking-support. If you run into any issues when automatically generating your config files its likely related to that. What is the depth of coverage like in these BAMs? are they paired-end reads? what did you use to align then?

I think before we add a loader for this tool type, it would be worth manually converting a couple lines to the standard mavis format http://mavis.bcgsc.ca/docs/latest/mavis_input.html#required-columns and trying to run mavis with that as input along with your BAM files. Pick a couple of SVs you expect to maintain if you can.

Does that sound doable?

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/bcgsc/mavis/issues/206#issuecomment-623020077, or unsubscribe https://github.com/notifications/unsubscribe-auth/ACSJSAFQ7VFJ7NJ2H3YO6DLRPSJSXANCNFSM4MHDNFYA .

creisle commented 4 years ago

@arnavaz awesome! can you put the details for the event you manually converted here? I will use it as a template for writing the parser and adding it as a standard input tool