Bioconductor / VariantAnnotation

Annotation of Genetic Variants
https://bioconductor.org/packages/VariantAnnotation
24 stars 20 forks source link

predictCoding with incorrect predictions #62

Closed wshropshire closed 2 years ago

wshropshire commented 2 years ago

Hello,

I'm running the code below using the attached vcf and GenBank file.

MFS_FTP_MB1860_merge.vcf.gz NZ_CP049085.2.gb.gz

`pacman::p_load( VariantAnnotation, genbankr, tidyverse, GenomicFeatures )

Set working directory

setwd("~/Documents/Active_projects/Exp_Evol_Enterobacterales/results/MB1860_variant_analysis/")

import vcf

vcf<-readVcf("results.vcf")

vcf <- readVcf("MFS_FTP_MB1860_merge.vcf")

import reference genbank file

gb<-readGenBank("NZ_CP049085.2.gb")

gff <- makeTxDbFromGFF("NZ_CP049085.2.gff3")

Need seqlevels to be equivalent between query (vcf2) and subject (txdb)

seqlevels(vcf) <- seqlevels(gb) txdb <-makeTxDbFromGenBank(gb)

for locating the coding variants

codingvar <- locateVariants(vcf, gff, CodingVariants()) allvar <- locateVariants(vcf, gff, AllVariants())

predict amino acid change

coding <- predictCoding(vcf, txdb, gb)`

When going through the output, everything looks fine except that the consequence, refcodon, altcodon appears to not be correct when referencing both the reference fasta file and vcf file. I noted that TXID and CDSID don't match, which I thought may indicate that perhaps the translation database isn't indexing the same positions as the Gb file? I'm not sure, but wanted to know if there was a way to keep these fields consistent so that predictCoding works accurately.

Thanks!

Will

wshropshire commented 2 years ago

Looks like all I needed to do was index my gff file to get the correct coordinates. Thanks!