Closed anmwinter closed 4 years ago
Hi @bioinfonm thanks for your interest. I guess translating gff files shouldn't be too complicated since Biopython supports gff. To win time, can you point me to an example gff file ?
Yes @Zulko ! Here is a zipped up gff file: https://github.com/bioinfonm/microbiome-of-the-subsurface-environment/blob/master/plants/longest_orfs.gff3.zip
and I attached the zipped file here as well. longest_orfs.gff3.zip
Thanks for the quick response! I tried using the biopython gff parser. But I am a little rusty on my coding skills right now.
ara
Hi, I confirm that gff work well, I am using it on protein sequence. Consider you have download .fasta and .gff in working directory you could create biopython sequenceRecord object using this function
from Bio import SeqIO
def load_sequence(uniprotID):
in_seq_handle = open(uniprotID+".fasta")
seq_dict = SeqIO.to_dict(SeqIO.parse(in_seq_handle, "fasta"))
in_seq_handle.close()
in_handle = open(uniprotID+".gff")
f_list=[] # initialize features list
for rec in GFF.parse(in_handle, base_dict=seq_dict):
print(rec)
for f in rec.features:
f_list.append(f)
in_handle.close()
seq_dict[list(seq_dict.keys())[0]].features=f_list
return(seq_dict[list(seq_dict.keys())[0]])
Then you could extract sequence record (here P53350 = basename of fasta and gff files )
seqrec=load_sequence('P53350')
And finally
from dna_features_viewer import BiopythonTranslator
graphic_record = bt.translate_record(seqrec)
ax, _ = graphic_record.plot(figure_width=10)
At this time everything is ok for me.
However, I encountered difficulties in filtering features of interest...
For example hwo can I filter to draw only the domain Protein kinase ?
type: Domain
location: [52:305]
qualifiers:
Key: Note, Value: ['Protein kinase']
Key: Ontology_term, Value: ['ECO:0000255']
Key: evidence, Value: ['ECO:0000255|PROSITE-ProRule:PRU00159']
Key: source, Value: ['UniProtKB']
loocking in the class BiopythonTranslator don't help me because the arguments are not fully documented
A translator from SeqRecords to dna_features_viewer GraphicRecord.
Parameters
----------
features_filters
List of filters (some_biopython_feature) => True/False.
Only features passing all the filters are kept.
features_properties
A function (feature)=> properties_dict
Could you help me on this ? I need an example
First define a filter that returns True if your feature corresponds to what you want:
def feature_is_protein_kinase(feature):
feature_note = feature.qualifiers.get('Note', '')
feature_note = "".join(feature_note) # in case your note is a list and not a string
return feature_note == "Protein kinase"
Then include the filter in your translator:
from dna_features_viewer import BiopythonTranslator
translator = BiopythonTranslator(features_filter = [feature_is_protein_kinase])
graphic_record = translator.translate_record(seqrec)
ax, _ = graphic_record.plot(figure_width=10)
One solution I generally use, is to have the relevant feature plotted in red, and all other features plotted in white and without label (so that you can see the highlighted feature in some "context")
from dna_features_viewer import BiopythonTranslator
class MyCustomTranslator(BiopythonTranslator):
@staticmethod
def compute_feature_color(f):
return 'red' if feature_is_protein_kinase(f) else 'white'
@staticmethod
def compute_feature_label(f):
default = BiopythonTranslator.compute_feature_label(f)
return default if feature_is_protein_kinase(f) else None
translator = MyCustomTranslator()
graphic_record = translator.translate_record(seqrec)
ax, _ = graphic_record.plot(figure_width=10)
Thanks, It's perfect ! very nice job your package will be very useful for my project Just one suggestion for future : make all optional variables in argument of your function. To do that without changing the global comportement you can set default parameter. For exemple you could change the code like this
def compute_feature_label(feature,qualifiers_list=["label", "source", "locus_tag", "note"]):
"""Gets the 'label' of the feature."""
result = feature.type
for key in qualifiers_list:
if key in feature.qualifiers:
result = feature.qualifiers[key]
break
if isinstance(result, list):
return "|".join(result)
else:
return result
Now users can see what is your default parameters using help() and It's better if we want to costumize your code. Have a good day
Thanks all for the suggestions. I'll close this issue as Biopython.translate_record()
now supports GFF files via the BCBio library).
Thanks for writing this in python. I've been looking for a feature viewer for a few weeks. I was wondering how do I read in a gff file to be viewed.
Thanks!