Closed travis-m-blimkie closed 6 years ago
Hi,
Yes, getSeq()
works with a DNAStringSet object but not with a DNAString object. So unless you have a really good reason to use a DNAString object, why not use the DNAStringSet object returned by readDNAStringSet()
? Note that the DNAStringSet object x
must have names on it and the seqnames in the GRanges object must be valid x
names (i.e. they must belong to names(x)
):
library(BSgenome)
x <- DNAStringSet(c(chr1="ACAANAAGG", chr2="GGGGTTT"))
getSeq(x, GRanges(c("chr2:4-7:-", "chr1:2-7")))
# A DNAStringSet instance of length 2
# width seq
# [1] 4 AAAC
# [2] 6 CAANAA
With a length-1 DNAStringSet:
x1 <- x[1]
x1
# A DNAStringSet instance of length 1
# width seq names
# [1] 9 ACAANAAGG chr1
getSeq(x1, GRanges(c("chr1:8-9", "chr1:2-7")))
# A DNAStringSet instance of length 2
# width seq
# [1] 2 GG
# [2] 6 CAANAA
Hope this helps, H.
Thanks for the help, I got it to work. Cheers!
Hi,
I'm having the same issue
> test
DNAStringSet object of length 10000:
width seq names
[1] 90735 CGTTGGTTTCTAAGCTTTACA...AACGGAGCAGTGTAATGGCTC 0000000001
[2] 77905 ATCGGCAAACTGGCTGCGTGG...GAGCCTGACGATGACCTACTT 0000000002
[3] 74184 TATCTTCACCTAATCCAAGGA...GTAAGTAAGTAGGTAAACATA 0000000003
[4] 68120 ATCGTTCCCAGGCCGGTATGT...GATCGTGATTGAACTTATTGA 0000000004
[5] 67765 TGATGTGGTTGCAGTAGCTGC...AAATACATACATATCGAGGAC 0000000005
... ... ...
[9996] 1837 TCATTTAAAACTTTTAAATCA...ACATATAGTTATCTGCTATTT 0000009996
[9997] 1837 GGATGTGGGGCGCGTACCAGC...AAGATGGTGATTTTTCGATGC 0000009997
[9998] 1837 GTTTGGCCGGTGCTATTGGCT...AGTTCCCATGCTTCATCTCTC 0000009998
[9999] 1837 CGTTTCCGCCGCTTCGTTCAG...ACAAGGAGCGATTCGTTGCCA 0000009999
[10000] 1837 GTCGCGGGCTTTGACGTCGGT...CGGAAATGGTCAAGCGGGATT 0000010000
> test <- readDNAStringSet("test.fasta", format="fasta")
> sequences <- getSeq(test)
Error in (function (classes, fdef, mtable) :
unable to find an inherited method for function ‘getSeq’ for signature ‘"DNAStringSet"’
Any thoughts on why this is happening? Thank you
@susheelbhanu Seems getSeq
from Biostrings
only work on limited methods
r$> showMethods("getSeq")
Function: getSeq (package Biostrings)
x="FaFile"
x="FaFileList"
x="TwoBitFile"
You need to load BSgenome
package first to use the full function of getSeq
r$> library(BSgenome)
showMethods("getSeq")
Function: getSeq (package Biostrings)
x="BSgenome"
x="FaFile"
x="FaFileList"
x="TwoBitFile"
x="XStringSet"
Hello, I am trying to extract some sequences from a genome file. I have read in the genome into RStudio using readDNAStringSet, and then converting it to just a DNAString (its a bacterial genome, so there's only one chromosome/sequence). I have a data frame, containing the positions for each gene for which I want to extract the sequence. I have also duplicated the data frame as a Genomic Ranges object. Whenever I try to use the getSeq function to retrieve the desired sequences from the genome, I get the following error message: Error in (function (classes, fdef, mtable) : unable to find an inherited method for function ‘getSeq’ for signature ‘"DNAString"’ I have tried using both the GRanges object and specifying the appropriate columns from the data frame. I have included some of my code below. Any help would be greatly appreciated.
Read in the genome sequence
pao1_genome <- readDNAStringSet("../PAO1_genome/GCF_000006765.1_ASM676v1_genomic.fna", format = "fasta") pao1_genome <- DNAString(pao1_genome$
NC_002516.2 Pseudomonas aeruginosa PAO1 chromosome, complete genome
)Try to extract sequences with data frame
getSeq(pao1_genome, master_genes_list$locus_tag, start = master_genes_list$promoter, end = master_genes_list$start, strand = master_genes_list$strand)
And with the GRanges object
getSeq(x = pao1_genome, names = pao1_gr)
Data frame
locus_tag Up_Down start end strand feature_interval_length OperonID promoter 1 PA0070 Up 82404 83318 - 915 12045 82154 2 PA0078 Up 95048 96397 - 1350 12046 94798 3 PA0082 Up 100124 101158 + 1035 12047 99874
GRanges
GRanges object with 155 ranges and 0 metadata columns: seqnames ranges strand