ModelSEED / ModelSEEDDatabase

This repository contains the definitive copy of the biochemistry and metadata used to construct models using the ModelSEED/ProbAnno approach
Other
53 stars 38 forks source link

Integrating Pathway Data #25

Open samseaver opened 8 years ago

samseaver commented 8 years ago

This is from an old email of Chris' that I left open, I'm initiating some of this via a commit coming today, but posting Chris' email as a placeholder of the discussion:

"I would like us to deal far more thoroughly with pathways, including such data from KEGG, MetaCyc, and BIGG. In playing with BIGG recently, I realize they now offer easy download of virtually all their data, and we should exploit this.

I propose we have an entire primary table dedicated just to pathways, with the following columns:

Name: primary human readable name for the pathways ID: we should assign our own consistent IDs to pathways (e.g. path.1, path.2 etc) Source: where the pathway came from (e.g. KEGG, MetaCyc, BIGG) Source_ID: what is the ID of the pathway in the database it came from Aliases: list of aliases for the pathway Reactions: list of reactions in the pathway

Pathways are useful in a 1000 ways and not handling them thoroughly has been a real impediment for us.

I can think of many pathway sources we could integrate: KEGG, MetaCyc, BIGG, Subsystems, Scenarios. And we should do them all.

But a key question? Should we be working towards ultimately reconciling this data to maintain our own pathway ontology??? Something to think about. The issue with integrated pathway data is such data will never be consistently applied across the board.

I generally don't want to sign on for excessive curation commitments, but if we could come up with a computational rule we could automatically apply to maintain our own pathway ontology, I would favor that..."