Set working directory

setwd("~/Documents/Active_projects/Exp_Evol_Enterobacterales/results/MB1860_variant_analysis/")

import vcf

vcf<-readVcf("results.vcf")

vcf <- readVcf("MFS_FTP_MB1860_merge.vcf")

import reference genbank file

gb<-readGenBank("NZ_CP049085.2.gb")

gff <- makeTxDbFromGFF("NZ_CP049085.2.gff3")

Need seqlevels to be equivalent between query (vcf2) and subject (txdb)

seqlevels(vcf) <- seqlevels(gb) txdb <-makeTxDbFromGenBank(gb)

for locating the coding variants

codingvar <- locateVariants(vcf, gff, CodingVariants()) allvar <- locateVariants(vcf, gff, AllVariants())

predict amino acid change

coding <- predictCoding(vcf, txdb, gb)`

When going through the output, everything looks fine except that the consequence, refcodon, altcodon appears to not be correct when referencing both the reference fasta file and vcf file. I noted that TXID and CDSID don't match, which I thought may indicate that perhaps the translation database isn't indexing the same positions as the Gb file? I'm not sure, but wanted to know if there was a way to keep these fields consistent so that predictCoding works accurately.

Thanks!

Will

Bioconductor / VariantAnnotation

predictCoding with incorrect predictions #62