Bioconductor / GenomicFeatures

Query the gene models of a given organism/assembly
https://bioconductor.org/packages/GenomicFeatures
26 stars 12 forks source link

ExtractTranscriptSeqs extracted sequences with non-base #45

Closed lbwfff closed 2 years ago

lbwfff commented 2 years ago

Hi,

I want to use extractTranscriptSeqs to extract the CDS sequence corresponding to SNP and get the corresponding protein sequence, and the following is my code:

ref_genome <- BSgenome.Hsapiens.UCSC.hg38
alt_genome <- injectSNPs(ref_genome, SNPlocs.Hsapiens.dbSNP141.GRCh38)

cds<-cdsBy(txdb, by="tx", use.names=TRUE)
extractcds<-extractTranscriptSeqs(alt_genome,cds)

I found that if I do this, the final sequence will have many non-bases, such as "R", "W", "S", etc. I don't understand what kind of information these letters represent? How should I modify the code if I want to use the translate function of Biostrings?

Thanks, LeeLee

hpages commented 2 years ago

Hi @lbwfff,

Did you read the man page about what injectSNPs() does?

Please ask on our support site if you have any further question about this.

Thanks, H.