gmbecker / genbankr

http://bioconductor.org/packages/devel/bioc/html/genbankr.html
14 stars 9 forks source link

Error in switch(seqtype, bp = DNAString(chars), aa = AAString(chars), : EXPR must be a length 1 vector #18

Closed billzt closed 2 years ago

billzt commented 2 years ago

Local Genbank File, command is:

gb = readGenBank("test1.gbk")

file 1:

LOCUS       Salvelinus_fon   16624 bp    DNA     circular VRT 13-AUG-2022
DEFINITION  Salvelinus_fon Fish mitochondrion genome.
ACCESSION   Salvelinus_fon
VERSION     Salvelinus_fon
KEYWORDS    .
SOURCE      .
  ORGANISM  .
            .

runs OK

file2:

LOCUS       Salvelinus_fonti   16624 bp    DNA     circular VRT 13-AUG-2022
DEFINITION  Salvelinus_fonti Fish mitochondrion genome.
ACCESSION   Salvelinus_fonti
VERSION     Salvelinus_fonti
KEYWORDS    .
SOURCE      .
  ORGANISM  .
            .

is wrong

Error:

Error in switch(seqtype, bp = DNAString(chars), aa = AAString(chars),  : 
  EXPR must be a length 1 vector

test1.gbk.txt test2.gbk.txt

gmbecker commented 2 years ago

I've pushed the fix for this both here and in Bioc git (master, ie bioc devel only). Please have a look.

I used your test2 file as a regression test so that at least is now working.

billzt commented 2 years ago

Sorry for my late reply. I have only R v4.1 and cannot use bioc devel. So I directly downloaded the source code from this repo (master) and installed using install.packages("genbankr-master/", repos=NULL, type="source") and it seems been fixed.

I'm not good as R programming, but I'm still confused about this:

 readLocus = function(line) {
     ## missing strip fieldname?
     spl = strsplit(line, "[\t]+", line)[[1]]
     spl
 }

In fact, the LOCUS line in GenBank format do not use \t as delimiters. Instead, they use blanks. How about considering blanks as delimiters?