BradyAJohnston / plasmapR

Creating plasmid maps inside ggplot.
https://bradyajohnston.github.io/plasmapR/
Other
79 stars 8 forks source link

read_gb loses part of plasmid feature that spans across the start site #10

Open Dobrokhotov1989 opened 11 months ago

Dobrokhotov1989 commented 11 months ago

Hi there,

I've spotted a bug in the reading gbk file where a feature spans across the plasmid "start". In the gbk it looks like this:

 CDS             join(4891..5096,1..751)
                     /note="pLannotate"
                     /label="TurboID"
                     /database="snapgene"
                     /identity="100.0"
                     /match_length="99.7"
                     /fragment="False"
                     /other="CDS"

but when read by read_gb() it only 'remembers' the first part, i.e. 4891..5096.

This file is used in the example: 559763_pLann.txt

library(plasmapR)
my_plasmid <- '559763_pLann.txt'
read_gb(my_plasmid) %>%
  plot_plasmid()

I would appreciate it if you could take a look

BradyAJohnston commented 11 months ago

Thanks for flagging this. You are right that I currently haven't taken in to account features that will go over the 'origin' of the plasmid. I will have to have a think about how to handle this, as just having the features start and end on the correct places won't work nicely with ggplot.

BradyAJohnston commented 11 months ago

Is there a particular reason that the origin point for the plasmid is in this location? I can think of a solution that would basically be changing the origin point for all of the features to be a different location in the plasmid, that doesn't overlap with a feature. I figure this will be the easiest approach. I might not do it for the parsing step, but it might be something that can be done in the plotting stage.

Dobrokhotov1989 commented 11 months ago

Thanks Brady for the swift reply. There is no particular reason for origin being in the middle of a feature - it is just as I got plasmid sequencing results.