BradyAJohnston / plasmapR

Creating plasmid maps inside ggplot.
https://bradyajohnston.github.io/plasmapR/
Other
79 stars 8 forks source link

Using GFF/GFF3 as input #9

Open jqbeh opened 1 year ago

jqbeh commented 1 year ago

Hi, thanks for making this wonderful package!

I was wondering if I could use .gff or .gff3 as input instead of .gb?

Thanks!

BradyAJohnston commented 1 year ago

Hi mate. It would require writing a parser for these files. This could be done, but I am unfamiliar with this file type, could you provide some examples or some links to exampels?

jqbeh commented 1 year ago

Hi, thanks for the reply!

I am unable to upload the file here but you can download a sample GFF3 file from this link: https://www.ncbi.nlm.nih.gov/nuccore/NZ_LC495616.1

image
bryantmurphy commented 5 months ago

Since GFF3 files are already parsed in a tabular format, this is just a data wrangling problem to make it work with plot_plasmid(). For example, you can do:

library(tidyverse)
library(plasmapR)

dat <- read_tsv("sequence.gff3", comment = "#", col_names = FALSE) %>% 
  select(type = 3, start = 4, end = 5, direction = 7, attribute = 9) %>% 
  filter(!type %in% "gene") %>% 
  mutate(direction = ifelse(direction == "+", 1, -1), 
         type = ifelse(type == "region", "source", type),
         name = ifelse(type == "source",
                       str_match(attribute, "Name=(\\w+);")[,2],
                       str_match(attribute, "locus_tag=(\\w+);")[,2])) %>% 
  rownames_to_column("index")

plot_plasmid(dat, dat$name[1])
image