nilsj9 commented 4 years ago

Hi @gmbecker , currently I am attempting to parse a bunch of plastid genome records using genbankr. Thereby I am encountering recurring error messages and wonder wheter it is caused by a bug in genbankr or by wrong formatted GenBank Flat files. In the following I am listing three frequent error messages:

Error in `[[<-`(`*tmp*`, name, value = c("BWX36_gp082.1", "BWX36_gp082.1",  : 
  28 elements in value to replace 44 elements

Error in .Call2("solve_user_SEW0", start, end, width, PACKAGE = "IRanges") : 
  In range 13: at least two out of 'start', 'end', and 'width', must
  be supplied.
In addition: Warning messages:
1: In FUN(X[[i]], ...) : NAs introduced by coercion
2: In FUN(X[[i]], ...) : NAs introduced by coercion

Error : subscript contains NAs

> sessionInfo()
R version 4.0.2 (2020-06-22)
Platform: x86_64-w64-mingw32/x64 (64-bit)
Running under: Windows >= 8 x64 (build 9200)

I would be very grateful if you could help me fix these problems. Thank you in advance and best wishes.

kathooks commented 2 years ago

Hi @gmbecker , hi @nilsj9

I have the first of the issues with a bunch of human RefSeq identifiers, e.g.:

Annotations don't have 'locus_tag' label, using 'gene' as gene_id column
Annotations don't have 'locus_tag' label, using 'gene' as gene_id column
 Error in `[[<-`(`*tmp*`, name, value = c("COL17A1.1", "COL17A1.1", "COL17A1.1",  : 
  53 elements in value to replace 56 elements

It originates from genbankReader.R, line 873. Works when replacing with:

exns$transcript_id = cdss$transcript_id