Edinburgh-Genome-Foundry / DnaFeaturesViewer

:eye: Python library to plot DNA sequence features (e.g. from Genbank files)
https://edinburgh-genome-foundry.github.io/DnaFeaturesViewer/
MIT License
584 stars 90 forks source link

Changing feature labels form gbk record #8

Closed jrjhealey closed 6 years ago

jrjhealey commented 6 years ago

Hi Zulko,

Love the package! Exactly what I've been looking for.

I have a couple of queries about the best way to implement the following things:

Firstly, I'd like to maybe colour each CDS on an individual basis using Matplotlib's colourscales. How would you advise implementing this? I've been using the code below, so I'm thinking I could maybe iterate a list of colours and zip it to the feature list so that each feature is paired with an individual colour? Any better suggestions?

Secondly, is there some way to access additional options for labelling features? In the documented examples, you've shown how to forcibly renaming all the CDS to a specific string ("CDS here"). I'd like to label some/all of my features with the /product tag from a Genbank, as it's more informative than the locus tags. Is there some way to access these in the .features object? I can't see something that looks like it corresponds to this.

More generally, could I put a request in for some more 'fully featured' examples in the documentation that could be deconstructed (if you get the time of course!) ? I'd like to learn how to use the package in much more depth for the future.

Many thanks!

import os, platform, sys
from os.path import expanduser
home = expanduser("~")
import matplotlib
if platform.system() == "Darwin":
    matplotlib.use('TkAgg') # Avoid python framework errors on OSX

__author__ = "Joe R. J. Healey"
__version__ = "1.0.0"
__title__ = "PrettyPlotter"
__license__ = "GPLv3"
__author_email__ = "J.R.J.Healey@warwick.ac.uk"

from dna_features_viewer import BiopythonTranslator

class MyCustomTranslator(BiopythonTranslator):
    """Custom translator implementing the following theme:

    - Color terminators in green, CDS in blue, all other features in gold.
    - Do not display features that are restriction sites unless they are BamHI
    - Do not display labels for restriction sites
    - For CDS labels just write "CDS here" instead of the name of the gene.

    """

    def compute_feature_color(self, feature):
        if feature.type == "CDS":
            return "blue"
        # maybe zip() together iterated features and a list of colours from a colour scale rather than
        # having all features the same colour?
        elif feature.type == "terminator":
            return "green"
        else:
            return "gold"

    def compute_feature_label(self, feature):
        if feature.type == "CDS":
            return BiopythonTranslator.compute_feature_label(feature.description) # how to get /product description to replace CDS name/locus tag?

    def compute_filtered_features(self, features):
        """Do not display promoters. Just because."""
        return [feature for feature in features if (feature.type != "gene")]

def parse_args():
    """Parse commandline arguments"""
    import argparse

    try:
        parser = argparse.ArgumentParser(
            description='Make pretty images from sequence files (.dna/.gbk/.gff).')
        parser.add_argument(
            'seqfile',
            action='store',
            help='Input sequence file. Supported filetypes are genbank, GFF, and SnapGene\'s .dna')
        parser.add_argument(
            'image',
            action='store',
            default=home,
            help='Output image filename with extension')

        return parser.parse_args()

    except:
        print("An exception occured with argument parsing. Check your provided options.")
        sys.exit(1)

def main():

    args = parse_args()

    if os.path.splitext(args.seqfile)[1] == '.dna':
        print("Input file is a SnapGene DNA file. Calling snapgene_reader to convert to BioPython.")
        from snapgene_reader import snapgene_file_to_seqrecord
        seqrecord = snapgene_file_to_seqrecord(args.infile)
    else:
        pass

    graphic_record = MyCustomTranslator().translate_record(args.seqfile)
    ax, _ = graphic_record.plot(figure_width=10)
    ax.figure.tight_layout()
    ax.figure.savefig(args.image)

if __name__ == '__main__':
    main()
Zulko commented 6 years ago

Hey there,

I agree more examples would be a good thing. There are several possible answers to your questions.

First, keep in mind that the feature objects in the Biotranslator refer to Biopython Feature objects. So the Biopython docs will tell you everything about their structure.

Also, if a feature has qualifiers "color" or "label", these will be used by default by the BioPythonTranslator (unless you have a custom BiopythonTranslator where you overwrite this behavior). That means that instead of putting your logics in your custom translator, you can also pre-process your biopython record before you feed it to the translator:

seqrecord = snapgene_file_to_seqrecord(args.infile)#
for feature in seqrecord:
    feature.qualifiers["color"] = 'blue'
    # ... implement other rules
graphic_record =Translator().translate_record(args.seqfile)

This being said, here are some ways of doing what you want from the Translator:

For the gene product, I would do it as follows:

def compute_feature_label(self, feature):
    if feature.type == "CDS":
        return feature.qualifiers["product"]
        # or even more robust, will handle the case with 0 or 2+ products:
        return " ".join(feature.qualifiers.get("product", [""]))

For iterating through colors, i would do it as follows:

from itertools import cycle
color_iterator = cycle(['blue', 'red', 'green', 'purple'])

def compute_feature_color(self, feature):
        if feature.type == "CDS":
            return next(color_iterator)

also have a look at matplotlib.colors for ways of generating color palettes.

Zulko commented 6 years ago

Hey did it work for you ? Can I close the issue ?

jrjhealey commented 6 years ago

Hey Zulko, havent had chance to try it yet but its given me some ideas - feel free to close if you like!

MeggyC commented 5 years ago

Hi there, this is what I am looking for (the "For the gene product, I would do it as follows:" part)

I attempted to do:


def compute_feature_label(self, feature):
    if feature.type == "CDS":
        return feature.qualifiers["product"]

graphic_record = BiopythonTranslator().translate_record("/srv/projects/coral/Accelerate_project/analyses/20190408_prokka_VirSorter_assemblies/df50_prokka_output/146assemblyVIRSorter_k141_30585_flag=3_multi=368_0000_len=9522-circular-cat_1/PROKKA_05142019.gbk")

compute_feature_label(graphic_record, 'CDS')

ax, _ = graphic_record.plot(figure_width=10)

Here I get the error message:


AttributeError Traceback (most recent call last)

in 1 graphic_record = BiopythonTranslator().translate_record("/srv/projects/coral/Accelerate_project/analyses/20190408_prokka_VirSorter_assemblies/df50_prokka_output/146assemblyVIRSorter_k141_30585_flag=3_multi=368_0000_len=9522-circular-cat_1/PROKKA_05142019.gbk") ----> 2 compute_feature_label(graphic_record, 'CDS') 3 ax, _ = graphic_record.plot(figure_width=10) in compute_feature_label(self, feature) 1 def compute_feature_label(self, feature): ----> 2 if feature.type == "CDS": 3 return feature.qualifiers["product"] 4 AttributeError: 'str' object has no attribute 'type' I'm not sure what to do next, it would be great to have an example of how to alter the labels of the graphic_record object, I think many bioinformaticians/biologists would find this tool extremely useful if so. M.
Zulko commented 5 years ago

@MeggyC Follow the example at the end of the README: you must define a custom Translator class, not simply a compute_feature_label method. It should look like this:

class CustomTranslator(BiopythonTranslator):
    def compute_feature_label(self, feature):
        if feature.type == "CDS":
            return feature.qualifiers["product"]
        else:
            # Return the default label:
            return BiopythonTranslator.compute_feature_label(self, feature)

path = "/srv/projects/coral/Accelerate_project/analyses/20190408_prokka_VirSorter_assemblies/df50_prokka_output/146assemblyVIRSorter_k141_30585_flag=3_multi=368_0000_len=9522-circular-cat_1/PROKKA_05142019.gbk"
translator = CustomTranslator()
graphic_record = translator.translate_record(path)
ax, _ = graphic_record.plot(figure_width=10)
MeggyC commented 5 years ago

Hey there @Zulko - thanks for the explanation! Running precisely what you gave me flags up the following error:

AttributeError Traceback (most recent call last)

in 9 translator = CustomTranslator() 10 graphic_record = translator.translate_record(path) ---> 11 ax, _ = graphic_record.plot(figure_width=10) ~/.conda/envs/Megan_environment/lib/python3.6/site-packages/dna_features_viewer/GraphicRecord.py in plot(self, ax, figure_width, draw_line, with_ruler, plot_sequence, annotate_inline, level_offset, x_lim) 270 ax=ax, feature=feature, level=level 271 ) --> 272 nlines = len(feature.label.split("\n")) 273 line_height = height / nlines 274 # Hey look, a magic number! AttributeError: 'BiopythonTranslator' object has no attribute 'split' If you would like to troubleshoot using my file then I have attached it :D (file type changed to txt via simple rename as GitHub does not allow for gbk file types - ) [PROKKA_05142019txt.txt](https://github.com/Edinburgh-Genome-Foundry/DnaFeaturesViewer/files/3221837/PROKKA_05142019txt.txt)
Zulko commented 5 years ago

I think its because you need to upgrade DnaFeaturesViewer to the latest version.

pip install --upgrade dna_features_viewer
MeggyC commented 5 years ago

Hey there Zulko. Unfortunately this didn't change the situation, even after the pip update. Any hints? If it's any help I'm running python 3 from within a linux shell.

Zulko commented 5 years ago

Oh sorry for that, there was an error in the example I gave above. I corrected it but here it is again just in case:

class CustomTranslator(BiopythonTranslator):
    def compute_feature_label(self, feature):
        if feature.type == "CDS":
            return feature.qualifiers["product"]
        else:
            # Return the default label:
            return BiopythonTranslator.compute_feature_label(self, feature)

path = "/srv/projects/coral/Accelerate_project/analyses/20190408_prokka_VirSorter_assemblies/df50_prokka_output/146assemblyVIRSorter_k141_30585_flag=3_multi=368_0000_len=9522-circular-cat_1/PROKKA_05142019.gbk"
translator = CustomTranslator()
graphic_record = translator.translate_record(path)
ax, _ = graphic_record.plot(figure_width=10)

This should work (or at least solve the bug you are facing)

MeggyC commented 5 years ago

Hey Zulko, I see no change between the example from 5 days ago and the one just posted above, and therefore I get get the same error message. Hmmmm

Zulko commented 5 years ago

The change is compute_feature_label on this line:

return BiopythonTranslator.compute_feature_label(self, feature)

Weird that it still doesnt work. I'll have another look later.

MeggyC commented 5 years ago

Thanks! Looking forward to it :D

Zulko commented 5 years ago

Can you confirm that you see the exact same error when you run the code above?

MeggyC commented 5 years ago

Hey there,

It's not the same error, excuse me.

The older code returns the error:

---------------------------------------------------------------------------
AttributeError                            Traceback (most recent call last)
<ipython-input-7-1efb7adc5581> in <module>
     12 translator = CustomTranslator()
     13 graphic_record = translator.translate_record(path)
---> 14 ax, _ = graphic_record.plot(figure_width=10)

~/.conda/envs/Megan_environment/lib/python3.6/site-packages/dna_features_viewer/GraphicRecord.py in plot(self, ax, figure_width, draw_line, with_ruler, plot_sequence, annotate_inline, level_offset, x_lim)
    270                     ax=ax, feature=feature, level=level
    271                 )
--> 272                 nlines = len(feature.label.split("\n"))
    273                 line_height = height / nlines
    274                 # Hey look, a magic number!

AttributeError: 'BiopythonTranslator' object has no attribute 'split'

Whilst the newer edit returns:

---------------------------------------------------------------------------
AttributeError                            Traceback (most recent call last)
<ipython-input-8-0638680d98bb> in <module>
     13 translator = CustomTranslator()
     14 graphic_record = translator.translate_record(path)
---> 15 ax, _ = graphic_record.plot(figure_width=10)

~/.conda/envs/Megan_environment/lib/python3.6/site-packages/dna_features_viewer/GraphicRecord.py in plot(self, ax, figure_width, draw_line, with_ruler, plot_sequence, annotate_inline, level_offset, x_lim)
    270                     ax=ax, feature=feature, level=level
    271                 )
--> 272                 nlines = len(feature.label.split("\n"))
    273                 line_height = height / nlines
    274                 # Hey look, a magic number!

AttributeError: 'list' object has no attribute 'split'
Zulko commented 5 years ago

Ok, that must be because Biopython loads sometimes loads information as as list (['label']) rather than strings ('label'). Here is the fix:

class CustomTranslator(BiopythonTranslator):
    def compute_feature_label(self, feature):
        if feature.type == "CDS":
            return "".join(feature.qualifiers["product"])
        else:
            # Return the default label:
            return BiopythonTranslator.compute_feature_label(self, feature)

path = "/srv/projects/coral/Accelerate_project/analyses/20190408_prokka_VirSorter_assemblies/df50_prokka_output/146assemblyVIRSorter_k141_30585_flag=3_multi=368_0000_len=9522-circular-cat_1/PROKKA_05142019.gbk"
translator = CustomTranslator()
graphic_record = translator.translate_record(path)
ax, _ = graphic_record.plot(figure_width=10)
MeggyC commented 5 years ago

Brilliant, that works like a dream! Thanks!