Open hannespetur opened 5 years ago
Hi Hannes,
Thank you for posting! We dont know right away what is going on. We are currently both on holiday and I cant take a proper look until Im back Monday June 30. Ill look into it first thing when Im back. I hope that can work for you.
Best regards,
Lasse
Yes of course. Enjoy your holidays.
Hi Hannes,
The assertion is raised because a chromsome containing variants in the vcf is not in the reference fasta. To figure out what is going on, could you compare the sequence ids occurring in column 1 in the Manta vcf file with the sequence ids in the reference to see if there are variants on a chromosome in the vcf that is not in the reference file? As Manta uses the same reference as the convertAllele script in the Snakemake workflow it would be unexpected, but it should still be checked.
On a separate note we discovered some inconsistencies in how our reference sequences in the data bundle were generated and we have therefore recently (June 19) updated the data bundle. I noticed that your reference is called GRCh38.fa suggesting that you may have downloaded the older version of the data bundle, but it should only affect the canon/decoy files required by downstream by BayesTyper. But to be on the safe side, you should update to the current data bundle. We will add a version number to the data bundles in the future.
br,
Lasse
Hey Lasse,
Thank you for looking into this. But I did not see any contigs in the Manta VCF which are not part of the reference I am using. Here is the Manta output for NA12891 in case you'd like to take a look: https://www.dropbox.com/s/xxe7hht1sebpt3y/candidateSV.vcf.gz?dl=0
And I'm using the most recent data bundle. GRCh38.fa and GRCh38.fa.fai are symlinks to Homo_sapiens_assembly38.fasta and Homo_sapiens_assembly38.fasta.fai, respectively, which I created since I noticed that in the call_candidates_and_genotype.smk file that it expects the reference to be at that path. Should I handle this somehow differently?
I'm next going to next to change the Manta rule such that Manta only genotypes the primary autosome contigs (chr1-chr22) and see if the same assertion occurs. I am only interested in evaluating those contigs anyway.
All the best, Hannes
Things seem to be working alright by skipping the decoy sequences in Manta.
Hi Hannes,
Thanks for the update - Im happy that you found a solution. The assertion was raised due to an issue with the parsing of the fasta headers for some of the decoy sequences. I took the quick route and fixed the fasta headers and updated the data bundles - I also corrected the file naming issue you reported. I tested the new bundles on your Manta vcf file and it did not crash (it did on the old bundle). The long-term solution with a patch for the BayesTyper fasta parser will follow later.
Please do not hesitate to post again if you encounter other issues.
Best regards,
Lasse
Hello BayesTyper team,
first of all congratulations on your recent publication.
I would like to evaluate BayesTyper and compare it with my own work but I am running into some issues so I was wondering if you could help me. I am using your snakemake workflow on three platinum genome samples (NA12878, NA12891, NA12892) but in the manta_conv_all_all rule I get this assertion failure with all three samples of my samples:
I am using BayesTyper 1.3.1 and Manta 1.0.3 (I have also tried Manta 1.4 but I got the same assertion failure) on GRCh38.
The .log file:
Full slurm output file:
Any help appreciated. Hannes