Edinburgh-Genome-Foundry / DnaFeaturesViewer

:eye: Python library to plot DNA sequence features (e.g. from Genbank files)
https://edinburgh-genome-foundry.github.io/DnaFeaturesViewer/
MIT License
584 stars 90 forks source link

feature location in a join, how to correnctly translate #85

Open albodrug opened 4 months ago

albodrug commented 4 months ago

Hello,

Thanks for the great tool.

I have downloaded a .gb file in a similar way to article_examples/0_download_the_plasmid.py I have a gene with several isoforms and each isoform is summarized in a single CDS feature with a join for positions, like this:

CDS             complement(join(754..944,1514..1784,1858..2045,2923..3103,
                     4147..4728))

When viewing the gene with dna_feature_viewer, it looks like only the first coordinate and the last coordinate are taken into account, making the CDS span from 754 to 4728, when I would like 5 chunks corresponding to the five CDS ranges specified in the join.

Is this possible with DnaFeatureViewer, or should I split these CDS manually?

Here is the picture I obtain: Screenshot from 2024-04-17 15-46-49

Thanks! Alex

veghp commented 4 months ago

Thanks for trying out this tool.

Indeed, join() Genbank features are translated into one feature with the first start and last end locations. In detail, .translate_record("plasmid.gb") calls dna_features_viewer.biotools.load_record() which returns features with CompoundLocations, which are then translated by BiopythonTranslatorBase.translate_record() into GraphicFeatures.

You can try creating a new set of features using a loop:

in the example plasmid file I added this to FEATURES:

     CDS             complement(join(754..944,1514..1784,1858..2045,2923..3103,
                     4147..4728))
                     /gene="test(complement(join)"

using the example in A_linear_plot.py:

from dna_features_viewer import BiopythonTranslator
class CustomTranslator(BiopythonTranslator):

    # Label fields indicates the order in which annotations fields are
    # considered to determine the feature's label
    label_fields = ["label", "note", "name", "gene"]

    def compute_feature_legend_text(self, feature):
        return feature.type

    def compute_feature_color(self, feature):
        return {
            "rep_origin": "yellow",
            "CDS": "#ffd383",  # light orange
            "regulatory": "red",
            "misc_recomb": "#fbf3f6",  # pink
            "misc_feature": "#d1e9f1",  # light blue
            "backbone": "darkblue",
        }[feature.type]

    def compute_feature_box_color(self, feature):
        return "white"

    def compute_feature_box_linewidth(self, feature):
        return 0

# ADDED NEW SECTION:
import copy
from dna_features_viewer import biotools

plasmid = biotools.load_record("plasmid.gb")
for feature in plasmid.features:
    if len(feature.location.parts) != 1:  # implicit check, compound locations will have multiple parts
        for part in feature.location.parts:
            new_feature = copy.deepcopy(feature)  # otherwise overwrites the original feature
            new_feature.location = copy.deepcopy(part)
            plasmid.features += [new_feature]

translator = BiopythonTranslator()
graphic_record = translator.translate_record(plasmid)  # note this is the record, not the file
ax, _ = graphic_record.plot(figure_width=13, strand_in_label_threshold=7)
graphic_record.plot_legend(ax=ax, loc=1, ncol=3, frameon=False)
ax.figure.savefig("A_linear_plot.svg", bbox_inches="tight")

A_linear_plot

veghp commented 4 months ago

There may also be room for code improvement: BiopythonTranslatorBase.translate_record() calls self.compute_filtered_features() but this is defined only in the child class BiopythonTranslator. Proposed solution is to move compute_filtered_features() to the parent class.

albodrug commented 4 months ago

Thanks a ton for the insight! This is super helpful.