Edinburgh-Genome-Foundry / DnaFeaturesViewer

:eye: Python library to plot DNA sequence features (e.g. from Genbank files)
https://edinburgh-genome-foundry.github.io/DnaFeaturesViewer/
MIT License
584 stars 90 forks source link

gff files #2

Closed anmwinter closed 4 years ago

anmwinter commented 7 years ago

Thanks for writing this in python. I've been looking for a feature viewer for a few weeks. I was wondering how do I read in a gff file to be viewed.

Thanks!

Zulko commented 7 years ago

Hi @bioinfonm thanks for your interest. I guess translating gff files shouldn't be too complicated since Biopython supports gff. To win time, can you point me to an example gff file ?

anmwinter commented 7 years ago

Yes @Zulko ! Here is a zipped up gff file: https://github.com/bioinfonm/microbiome-of-the-subsurface-environment/blob/master/plants/longest_orfs.gff3.zip

and I attached the zipped file here as well. longest_orfs.gff3.zip

Thanks for the quick response! I tried using the biopython gff parser. But I am a little rusty on my coding skills right now.

ara

fabricecarles commented 7 years ago

Hi, I confirm that gff work well, I am using it on protein sequence. Consider you have download .fasta and .gff in working directory you could create biopython sequenceRecord object using this function

from Bio import SeqIO
def load_sequence(uniprotID):
    in_seq_handle = open(uniprotID+".fasta")
    seq_dict = SeqIO.to_dict(SeqIO.parse(in_seq_handle, "fasta"))
    in_seq_handle.close()
    in_handle = open(uniprotID+".gff")
    f_list=[] # initialize features list
    for rec in GFF.parse(in_handle, base_dict=seq_dict):
        print(rec)
        for f in rec.features:
            f_list.append(f)
    in_handle.close()
    seq_dict[list(seq_dict.keys())[0]].features=f_list
    return(seq_dict[list(seq_dict.keys())[0]])

Then you could extract sequence record (here P53350 = basename of fasta and gff files ) seqrec=load_sequence('P53350') And finally

from dna_features_viewer import BiopythonTranslator
graphic_record = bt.translate_record(seqrec)
ax, _ = graphic_record.plot(figure_width=10)

At this time everything is ok for me.

However, I encountered difficulties in filtering features of interest...
For example hwo can I filter to draw only the domain Protein kinase ?

type: Domain
location: [52:305]
qualifiers:
    Key: Note, Value: ['Protein kinase']
    Key: Ontology_term, Value: ['ECO:0000255']
    Key: evidence, Value: ['ECO:0000255|PROSITE-ProRule:PRU00159']
    Key: source, Value: ['UniProtKB']

loocking in the class BiopythonTranslator don't help me because the arguments are not fully documented

A translator from SeqRecords to dna_features_viewer GraphicRecord.
    Parameters
    ----------
    features_filters
      List of filters (some_biopython_feature) => True/False.
      Only features passing all the filters are kept.
    features_properties
      A function (feature)=> properties_dict

Could you help me on this ? I need an example

Zulko commented 7 years ago

First define a filter that returns True if your feature corresponds to what you want:

def feature_is_protein_kinase(feature):
    feature_note = feature.qualifiers.get('Note', '')
    feature_note = "".join(feature_note) # in case your note is a list and not a string
    return feature_note == "Protein kinase"

Then include the filter in your translator:

from dna_features_viewer import BiopythonTranslator
translator = BiopythonTranslator(features_filter = [feature_is_protein_kinase])
graphic_record = translator.translate_record(seqrec)
ax, _ = graphic_record.plot(figure_width=10)

One solution I generally use, is to have the relevant feature plotted in red, and all other features plotted in white and without label (so that you can see the highlighted feature in some "context")

from dna_features_viewer import BiopythonTranslator

class MyCustomTranslator(BiopythonTranslator):
    @staticmethod
    def compute_feature_color(f):
        return 'red' if feature_is_protein_kinase(f) else 'white'

    @staticmethod
    def compute_feature_label(f):
        default = BiopythonTranslator.compute_feature_label(f)
        return  default if feature_is_protein_kinase(f) else None

translator = MyCustomTranslator()
graphic_record = translator.translate_record(seqrec)
ax, _ = graphic_record.plot(figure_width=10)
fabricecarles commented 7 years ago

Thanks, It's perfect ! very nice job your package will be very useful for my project Just one suggestion for future : make all optional variables in argument of your function. To do that without changing the global comportement you can set default parameter. For exemple you could change the code like this

 def compute_feature_label(feature,qualifiers_list=["label", "source", "locus_tag", "note"]):
        """Gets the 'label' of the feature."""
        result = feature.type
        for key in qualifiers_list:
            if key in feature.qualifiers:
                result = feature.qualifiers[key]
                break
        if isinstance(result, list):
            return "|".join(result)
        else:
            return result

Now users can see what is your default parameters using help() and It's better if we want to costumize your code. Have a good day

Zulko commented 4 years ago

Thanks all for the suggestions. I'll close this issue as Biopython.translate_record() now supports GFF files via the BCBio library).