bio2bel / expasy

A Bio2BEL package for converting ExPASy to BEL
MIT License
0 stars 0 forks source link

Use PyUniProt to incorporate members of Enzyme classes #5

Closed cthoyt closed 7 years ago

cthoyt commented 7 years ago

It's directly possible to iterate over all instances of each enzyme class through PyUniProt.

  1. Generate final portion of EC tree
  2. Generate EC - UniProt mapping (or even better, using PyUniProt it would be possible to generate an EC-HGNC mapping. Might as well do both if they're both easy)
aramgrigoryan commented 7 years ago
  1. what is the "final portion" of EC tree?
cthoyt commented 7 years ago

The final portion is the 4th level, which refers to actual proteins that have the activities in the first 3 levels from the EXPASY tree. You can go through UniProt to get these annotations and use text parsing to infer the parents.

http://pyuniprot.readthedocs.io/en/latest/query_functions.html#ec-number

import pyuniprot
query = pyuniprot.query()
for ec in query.ec_number():
   # get actual EC number, infer the parent, and the annotated children
aramgrigoryan commented 7 years ago

i am gonna query only humans as it takes forever and has a memory leak

cthoyt commented 7 years ago

which part? does it not provide a lazy iterator over the database?

because we fixed the memory leak with loading and now it only uses 117mb total :)

aramgrigoryan commented 7 years ago

it's looking fine now. seems to be working.. have to wait