gmbecker / genbankr

http://bioconductor.org/packages/devel/bioc/html/genbankr.html
14 stars 9 forks source link

Error reading Plasmid Genbank files #19

Open chagas98 opened 2 years ago

chagas98 commented 2 years ago

I am trying to read plasmid sequences from different sources and still getting this error:

Error in .normarg_isCircular(isCircular, seqnames) : the length of the supplied 'isCircular' vector must be equal to the number of sequences

gmbecker commented 2 years ago

Can you give me an accession or a gbk file which causes this issue please?

chagas98 commented 2 years ago

petm20.zip

chagas98 commented 2 years ago

I read the gbk files with python functions. It was tricky to integrate with other R parts. Some solutions to plasmids might be useful in the future. Thanks in advance @gmbecker

gmbecker commented 2 years ago

@chagas98 can you please try with the latest commit here? locally I'm getting no errors for this file:

> thing <- readGenBank("~/gabe/checkedout/genbankr_gh/inst/unitTests/petm20.gbk")
Annotations don't have 'locus_tag' label, using 'gene' as gene_id column
genes not available for all CDS ranges, using internal grouping ids
No exons read from genbank file. Assuming sections of CDS are full exons
No transcript features (mRNA) found, using spans of CDSs
> thing
GenBank Annotations
synthetic circular DNA 
Accession: .
1 Sequence(s) with total length length: 7700
0 genes
9 transcripts
9 exons/cds elements
0 variations
10 other features
jan-glx commented 2 years ago

Your Locus field contains a t, a wrong regex caused this to split the it during parsing and thus the error. This was fixed in 47a0b78