bioinformatics-centre / BayesTyper

A method for variant graph genotyping based on exact alignment of k-mers
86 stars 7 forks source link

convertAllele assertion failure #6

Open hannespetur opened 5 years ago

hannespetur commented 5 years ago

Hello BayesTyper team,

first of all congratulations on your recent publication.

I would like to evaluate BayesTyper and compare it with my own work but I am running into some issues so I was wondering if you could help me. I am using your snakemake workflow on three platinum genome samples (NA12878, NA12891, NA12892) but in the manta_conv_all_all rule I get this assertion failure with all three samples of my samples:

bayesTyperTools: /isdata/kroghgrp/jasi/bayesTyper/code/releases/v1.3.1_static/BayesTyper/src/bayesTyperTools/ConvertAllele.cpp:119: void ConvertAllele::convertAllele(const string&, const string&, const string&, const string&, bool): Assertion `genome_seqs_it != genome_seqs.end()' failed.
/usr/bin/bash: line 1: 162862 Aborted                 (core dumped) /nfs/odinn/tmp/hannese/requests/180704_bayestyper_dataset/bayestyper/bayesTyper_v1.3.1_linux_x86_64/bin/bayesTyperTools convertAllele -v manta/NA12892/results/variants/candidateSV.vcf.gz -g /odinn/tmp/hannese/requests/180704_bayestyper_dataset/data_bundle/bayestyper_GRCh38_bundle/GRCh38.fa -z -o manta/NA12892/results/variants/candidateSV_converted > manta/NA12892_convert.log

I am using BayesTyper 1.3.1 and Manta 1.0.3 (I have also tried Manta 1.4 but I got the same assertion failure) on GRCh38.

The .log file:

[17/07/2018 19:43:05] You are using BayesTyperTools (v1.3.1)

[17/07/2018 19:43:05] Running BayesTyperTools (v1.3.1) convertAllele ...

[17/07/2018 19:44:09] Parsed 3366 chromosome(s)
[17/07/2018 19:44:09] Parsed 0 mobile element insertion sequence(s)

[17/07/2018 19:53:03] Parsed 100000 variant(s) ...

Full slurm output file:

[Tue Jul 17 19:43:05 2018] Building DAG of jobs...
[Tue Jul 17 19:43:05 2018] Using shell: /usr/bin/bash
[Tue Jul 17 19:43:05 2018] Provided cores: 24
[Tue Jul 17 19:43:05 2018] Rules claiming more threads will be scaled down.
[Tue Jul 17 19:43:05 2018] Job counts:
[Tue Jul 17 19:43:05 2018]      count   jobs
[Tue Jul 17 19:43:05 2018]      1       manta_conv_all_id
[Tue Jul 17 19:43:05 2018]      1

[Tue Jul 17 19:43:05 2018] rule manta_conv_all_id:
[Tue Jul 17 19:43:05 2018]     input: manta/NA12892/results/variants/candidateSV.vcf.gz
[Tue Jul 17 19:43:05 2018]     output: manta/NA12892/results/variants/candidateSV_converted.vcf.gz
[Tue Jul 17 19:43:05 2018]     log: manta/NA12892_convert.log
[Tue Jul 17 19:43:05 2018]     jobid: 0
[Tue Jul 17 19:43:05 2018]     wildcards: sample_id=NA12892

bayesTyperTools: /isdata/kroghgrp/jasi/bayesTyper/code/releases/v1.3.1_static/BayesTyper/src/bayesTyperTools/ConvertAllele.cpp:119: void ConvertAllele::convertAllele(const string&, const string&, const string&, const string&, bool): Assertion `genome_seqs_it != genome_seqs.end()' failed.
/usr/bin/bash: line 1: 162862 Aborted                 (core dumped) /nfs/odinn/tmp/hannese/requests/180704_bayestyper_dataset/bayestyper/bayesTyper_v1.3.1_linux_x86_64/bin/bayesTyperTools convertAllele -v manta/NA12892/results/variants/candidateSV.vcf.gz -g /odinn/tmp/hannese/requests/180704_bayestyper_dataset/data_bundle/bayestyper_GRCh38_bundle/GRCh38.fa -z -o manta/NA12892/results/variants/candidateSV_converted > manta/NA12892_convert.log
[Tue Jul 17 19:53:26 2018] Error in rule manta_conv_all_id:
[Tue Jul 17 19:53:26 2018]     jobid: 0
[Tue Jul 17 19:53:26 2018]     output: manta/NA12892/results/variants/candidateSV_converted.vcf.gz
[Tue Jul 17 19:53:26 2018]     log: manta/NA12892_convert.log

[Tue Jul 17 19:53:26 2018] RuleException:
[Tue Jul 17 19:53:26 2018] CalledProcessError in line 101 of /nfs/odinn/tmp/hannese/requests/180704_bayestyper_dataset/workflows/rules/call_candidates.smk:
[Tue Jul 17 19:53:26 2018] Command ' set -euo pipefail;  /nfs/odinn/tmp/hannese/requests/180704_bayestyper_dataset/bayestyper/bayesTyper_v1.3.1_linux_x86_64/bin/bayesTyperTools convertAllele -v manta/NA12892/results/variants/candidateSV.vcf.gz -g /odinn/tmp/hannese/requests/180704_bayestyper_dataset/data_bundle/bayestyper_GRCh38_bundle/GRCh38.fa -z -o manta/NA12892/results/variants/candidateSV_converted > manta/NA12892_convert.log ' returned non-zero exit status 134.
[Tue Jul 17 19:53:26 2018]   File "/nfs/odinn/tmp/hannese/requests/180704_bayestyper_dataset/workflows/rules/call_candidates.smk", line 101, in __rule_manta_conv_all_id
[Tue Jul 17 19:53:26 2018]   File "/nfs/prog/bioinfo/apps-x86_64/python/3.6.3/lib/python3.6/concurrent/futures/thread.py", line 56, in run
[Tue Jul 17 19:53:26 2018] Removing output files of failed job manta_conv_all_id since they might be corrupted:
[Tue Jul 17 19:53:26 2018] manta/NA12892/results/variants/candidateSV_converted.vcf.gz
[Tue Jul 17 19:53:27 2018] Will exit after finishing currently running jobs.
[Tue Jul 17 19:53:27 2018] Shutting down, this might take some time.
[Tue Jul 17 19:53:27 2018] Exiting because a job execution failed. Look above for error message
[Tue Jul 17 19:53:27 2018] Complete log: /nfs/odinn/tmp/hannese/requests/180704_bayestyper_dataset/workflows/.snakemake/log/2018-07-17T194304.495091.snakemake.log

Any help appreciated. Hannes

lassemaretty commented 5 years ago

Hi Hannes,

Thank you for posting! We dont know right away what is going on. We are currently both on holiday and I cant take a proper look until Im back Monday June 30. Ill look into it first thing when Im back. I hope that can work for you.

Best regards,

Lasse

hannespetur commented 5 years ago

Yes of course. Enjoy your holidays.

lassemaretty commented 5 years ago

Hi Hannes,

The assertion is raised because a chromsome containing variants in the vcf is not in the reference fasta. To figure out what is going on, could you compare the sequence ids occurring in column 1 in the Manta vcf file with the sequence ids in the reference to see if there are variants on a chromosome in the vcf that is not in the reference file? As Manta uses the same reference as the convertAllele script in the Snakemake workflow it would be unexpected, but it should still be checked.

On a separate note we discovered some inconsistencies in how our reference sequences in the data bundle were generated and we have therefore recently (June 19) updated the data bundle. I noticed that your reference is called GRCh38.fa suggesting that you may have downloaded the older version of the data bundle, but it should only affect the canon/decoy files required by downstream by BayesTyper. But to be on the safe side, you should update to the current data bundle. We will add a version number to the data bundles in the future.

br,

Lasse

hannespetur commented 5 years ago

Hey Lasse,

Thank you for looking into this. But I did not see any contigs in the Manta VCF which are not part of the reference I am using. Here is the Manta output for NA12891 in case you'd like to take a look: https://www.dropbox.com/s/xxe7hht1sebpt3y/candidateSV.vcf.gz?dl=0

And I'm using the most recent data bundle. GRCh38.fa and GRCh38.fa.fai are symlinks to Homo_sapiens_assembly38.fasta and Homo_sapiens_assembly38.fasta.fai, respectively, which I created since I noticed that in the call_candidates_and_genotype.smk file that it expects the reference to be at that path. Should I handle this somehow differently?

I'm next going to next to change the Manta rule such that Manta only genotypes the primary autosome contigs (chr1-chr22) and see if the same assertion occurs. I am only interested in evaluating those contigs anyway.

All the best, Hannes

hannespetur commented 5 years ago

Things seem to be working alright by skipping the decoy sequences in Manta.

lassemaretty commented 5 years ago

Hi Hannes,

Thanks for the update - Im happy that you found a solution. The assertion was raised due to an issue with the parsing of the fasta headers for some of the decoy sequences. I took the quick route and fixed the fasta headers and updated the data bundles - I also corrected the file naming issue you reported. I tested the new bundles on your Manta vcf file and it did not crash (it did on the old bundle). The long-term solution with a patch for the BayesTyper fasta parser will follow later.

Please do not hesitate to post again if you encounter other issues.

Best regards,

Lasse