Visualize contig genbank file

Edinburgh-Genome-Foundry / DnaFeaturesViewer

:eye: Python library to plot DNA sequence features (e.g. from Genbank files)

https://edinburgh-genome-foundry.github.io/DnaFeaturesViewer/

MIT License

584 stars 90 forks source link

Visualize contig genbank file #4

Open sarah872 opened 7 years ago

sarah872 commented 7 years ago

Hey there, I am really astonished how easy it is with your script to visualize genes, but unfortunately I am working a lot with genome drafts, so unclosed genomes with many contigs in one genbank file. Is there a way to parse more than one and not getting the error ValueError: More than one record found in handle Cheers

Zulko commented 7 years ago

I haven't tested it but you could try with SeqIO.parse(), like this:

from Bio import SeqIO
from dna_features_viewer import BiopythonTranslator
translator = BiopythonTranslator()
graphic_records = [
    translator.translate_record(record)
    for record in SeqIO.parse('my_multiple_records.gb', 'genbank')
]

Then to plot them, as usual:

for i, graphic_record in enumerate(graphic_records):
    ax, _ = graphic_record.plot()
    ax.figure.savefig('record_%03d.png' % i)

sarah872 commented 7 years ago

If I execute this using my file, I get many files that have the right axis length, but there is one gene over the whole contig that is called source that's an excerpt of my file:

LOCUS       contig         4395 bp    DNA     linear   UNK 
DEFINITION  name
ACCESSION   contig
KEYWORDS    .
SOURCE      organism
  ORGANISM  organism
            Bacteria.
FEATURES             Location/Qualifiers
     source          1..4395
                     /mol_type="genomic DNA"
                     /db_xref="taxon:"
                     /genome_md5="something"
                     /project="something"
                     /genome_id="something"
                     /organism="something"
     CDS             complement(36..689)
                     /db_xref="SEED:fig|6666666.256796.peg.1026"
                     /translation="MMMMMMMMMMMMMMGSGKYQSDIVR"
                     /product="hypothetical protein"
                     /transl_table=11
BASE COUNT      903 a   1307 c   1323 g    862 t
ORIGIN      
        1 taaaaatctc gggattttga atcattaaga attaactacc ttactatatc actctgatat
       61 ttaccactac cccggatcgc tgccgggtta cccgccgagg tggagtcatg cagactcttg
      121 cgcaccacca gggtgccgtc cgcgtcggta aagacgttct cggtggcggc catccccgcc
      181 atctccttga tggggttgag gcgatagggg agatagcgca ggaagatccg ttcggcataa

Zulko commented 7 years ago

I am not sure I understand. Can you please explain from the beginning what you are trying to achieve ?

sarah872 commented 7 years ago

I want to plot all the protein coding genes that are in one contig, so as an output I have as many plots as contigs, each plot with the respective genes.

Zulko commented 7 years ago

Still not clear sorry - I take it that what you are trying to do would be obvious for people from the same field, but in my case I cant connect the dots in your different messages. Try explaining like I am 5 :stuck_out_tongue:

sarah872 commented 7 years ago

ahahah ok 👍 I am working a lot with metagenomic data, so I have incomplete draft genome, that means many scaffolds/contigs that represent one genome, but are not connected to a full chromosome. Therefore I try to visualize all contigs individually, each with its CDS. Thanks for your help!

sarah872 commented 7 years ago

any news on that?