When going through the output, everything looks fine except that the consequence, refcodon, altcodon appears to not be correct when referencing both the reference fasta file and vcf file. I noted that TXID and CDSID don't match, which I thought may indicate that perhaps the translation database isn't indexing the same positions as the Gb file? I'm not sure, but wanted to know if there was a way to keep these fields consistent so that predictCoding works accurately.
Hello,
I'm running the code below using the attached vcf and GenBank file.
MFS_FTP_MB1860_merge.vcf.gz NZ_CP049085.2.gb.gz
`pacman::p_load( VariantAnnotation, genbankr, tidyverse, GenomicFeatures )
Set working directory
setwd("~/Documents/Active_projects/Exp_Evol_Enterobacterales/results/MB1860_variant_analysis/")
import vcf
vcf<-readVcf("results.vcf")
vcf <- readVcf("MFS_FTP_MB1860_merge.vcf")
import reference genbank file
gb<-readGenBank("NZ_CP049085.2.gb")
gff <- makeTxDbFromGFF("NZ_CP049085.2.gff3")
Need seqlevels to be equivalent between query (vcf2) and subject (txdb)
seqlevels(vcf) <- seqlevels(gb) txdb <-makeTxDbFromGenBank(gb)
for locating the coding variants
codingvar <- locateVariants(vcf, gff, CodingVariants()) allvar <- locateVariants(vcf, gff, AllVariants())
predict amino acid change
coding <- predictCoding(vcf, txdb, gb)`
When going through the output, everything looks fine except that the consequence, refcodon, altcodon appears to not be correct when referencing both the reference fasta file and vcf file. I noted that TXID and CDSID don't match, which I thought may indicate that perhaps the translation database isn't indexing the same positions as the Gb file? I'm not sure, but wanted to know if there was a way to keep these fields consistent so that predictCoding works accurately.
Thanks!
Will