Bioconductor / Biostrings

Efficient manipulation of biological strings
https://bioconductor.org/packages/Biostrings
57 stars 16 forks source link

getSeq: Unable to find inherited method for DNAString #11

Closed travis-m-blimkie closed 6 years ago

travis-m-blimkie commented 6 years ago

Hello, I am trying to extract some sequences from a genome file. I have read in the genome into RStudio using readDNAStringSet, and then converting it to just a DNAString (its a bacterial genome, so there's only one chromosome/sequence). I have a data frame, containing the positions for each gene for which I want to extract the sequence. I have also duplicated the data frame as a Genomic Ranges object. Whenever I try to use the getSeq function to retrieve the desired sequences from the genome, I get the following error message: Error in (function (classes, fdef, mtable) : unable to find an inherited method for function ‘getSeq’ for signature ‘"DNAString"’ I have tried using both the GRanges object and specifying the appropriate columns from the data frame. I have included some of my code below. Any help would be greatly appreciated.

Read in the genome sequence

pao1_genome <- readDNAStringSet("../PAO1_genome/GCF_000006765.1_ASM676v1_genomic.fna", format = "fasta") pao1_genome <- DNAString(pao1_genome$NC_002516.2 Pseudomonas aeruginosa PAO1 chromosome, complete genome)

Try to extract sequences with data frame

getSeq(pao1_genome, master_genes_list$locus_tag, start = master_genes_list$promoter, end = master_genes_list$start, strand = master_genes_list$strand)

And with the GRanges object

getSeq(x = pao1_genome, names = pao1_gr)

Data frame

locus_tag Up_Down start end strand feature_interval_length OperonID promoter 1 PA0070 Up 82404 83318 - 915 12045 82154 2 PA0078 Up 95048 96397 - 1350 12046 94798 3 PA0082 Up 100124 101158 + 1035 12047 99874

GRanges

GRanges object with 155 ranges and 0 metadata columns: seqnames ranges strand

[1] PA0070 [ 82154, 82404] - [2] PA0078 [ 94798, 95048] - [3] PA0082 [ 99874, 100124] + ### SesionInfo R version 3.4.4 (2018-03-15) Platform: x86_64-pc-linux-gnu (64-bit) Running under: Ubuntu 17.10 Matrix products: default BLAS: /usr/lib/x86_64-linux-gnu/blas/libblas.so.3.7.1 LAPACK: /usr/lib/x86_64-linux-gnu/lapack/liblapack.so.3.7.1 locale: [1] LC_CTYPE=en_CA.UTF-8 LC_NUMERIC=C LC_TIME=en_CA.UTF-8 LC_COLLATE=en_CA.UTF-8 LC_MONETARY=en_CA.UTF-8 [6] LC_MESSAGES=en_CA.UTF-8 LC_PAPER=en_CA.UTF-8 LC_NAME=C LC_ADDRESS=C LC_TELEPHONE=C [11] LC_MEASUREMENT=en_CA.UTF-8 LC_IDENTIFICATION=C attached base packages: [1] parallel stats4 stats graphics grDevices utils datasets methods base other attached packages: [1] Biostrings_2.46.0 XVector_0.18.0 GenomicRanges_1.30.3 GenomeInfoDb_1.14.0 IRanges_2.12.0 S4Vectors_0.16.0 [7] BiocGenerics_0.24.0 loaded via a namespace (and not attached): [1] zlibbioc_1.24.0 compiler_3.4.4 tools_3.4.4 GenomeInfoDbData_1.0.0 RCurl_1.95-4.10 yaml_2.1.18 [7] bitops_1.0-6 Thanks!
hpages commented 6 years ago

Hi,

Yes, getSeq() works with a DNAStringSet object but not with a DNAString object. So unless you have a really good reason to use a DNAString object, why not use the DNAStringSet object returned by readDNAStringSet()? Note that the DNAStringSet object x must have names on it and the seqnames in the GRanges object must be valid x names (i.e. they must belong to names(x)):

library(BSgenome)

x <- DNAStringSet(c(chr1="ACAANAAGG", chr2="GGGGTTT"))

getSeq(x, GRanges(c("chr2:4-7:-", "chr1:2-7")))
#  A DNAStringSet instance of length 2
#     width seq
# [1]     4 AAAC
# [2]     6 CAANAA

With a length-1 DNAStringSet:

x1 <- x[1]
x1
#   A DNAStringSet instance of length 1
#     width seq                                               names               
# [1]     9 ACAANAAGG                                         chr1

getSeq(x1, GRanges(c("chr1:8-9", "chr1:2-7")))
#   A DNAStringSet instance of length 2
#     width seq
# [1]     2 GG
# [2]     6 CAANAA

Hope this helps, H.

travis-m-blimkie commented 6 years ago

Thanks for the help, I got it to work. Cheers!

susheelbhanu commented 3 months ago

Hi,

I'm having the same issue

> test

DNAStringSet object of length 10000:
        width seq                                           names
    [1] 90735 CGTTGGTTTCTAAGCTTTACA...AACGGAGCAGTGTAATGGCTC 0000000001
    [2] 77905 ATCGGCAAACTGGCTGCGTGG...GAGCCTGACGATGACCTACTT 0000000002
    [3] 74184 TATCTTCACCTAATCCAAGGA...GTAAGTAAGTAGGTAAACATA 0000000003
    [4] 68120 ATCGTTCCCAGGCCGGTATGT...GATCGTGATTGAACTTATTGA 0000000004
    [5] 67765 TGATGTGGTTGCAGTAGCTGC...AAATACATACATATCGAGGAC 0000000005
    ...   ... ...
 [9996]  1837 TCATTTAAAACTTTTAAATCA...ACATATAGTTATCTGCTATTT 0000009996
 [9997]  1837 GGATGTGGGGCGCGTACCAGC...AAGATGGTGATTTTTCGATGC 0000009997
 [9998]  1837 GTTTGGCCGGTGCTATTGGCT...AGTTCCCATGCTTCATCTCTC 0000009998
 [9999]  1837 CGTTTCCGCCGCTTCGTTCAG...ACAAGGAGCGATTCGTTGCCA 0000009999
[10000]  1837 GTCGCGGGCTTTGACGTCGGT...CGGAAATGGTCAAGCGGGATT 0000010000
> test <- readDNAStringSet("test.fasta", format="fasta")

> sequences <- getSeq(test)

Error in (function (classes, fdef, mtable)  :
  unable to find an inherited method for function ‘getSeq’ for signature ‘"DNAStringSet"’

Any thoughts on why this is happening? Thank you

h4rvey-g commented 2 months ago

@susheelbhanu Seems getSeq from Biostrings only work on limited methods

r$> showMethods("getSeq")
Function: getSeq (package Biostrings)
x="FaFile"
x="FaFileList"
x="TwoBitFile"

You need to load BSgenome package first to use the full function of getSeq

r$> library(BSgenome)
    showMethods("getSeq")

Function: getSeq (package Biostrings)
x="BSgenome"
x="FaFile"
x="FaFileList"
x="TwoBitFile"
x="XStringSet"