ModelSEED / Model-SEED-core

The Model SEED is a tool for building, curating, and analyzing gonome-scale metabolic models. Visit the Model SEED homepage for installation instructions and full feature documentation.
http://bionet.mcs.anl.gov/index.php/Model_SEED_Homepage
Other
19 stars 10 forks source link

Compound assignment feature request #143

Open samseaver opened 12 years ago

samseaver commented 12 years ago

One of the things that will happen as we work with teams in manually curating their metabolic models, is that we will find compounds that are erroneously merged, or not. Sugars are a culprit here because of stereoisomerism.

What I need, as we move forward and discover these, is a way of either merging compounds or splitting compounds. A merge would be straight-forward, as all aliases and reactions would be merged, but a split is a little more complex because one must make sure that the correct InChI strings and synonyms go with the correct compound.

In addition to this, all reactions in the database, and in the models, which use these compounds must be updated. If a compound is split into two compounds, then the care must be taken to ask whether the new reactions that emerge actually exist.

devoid commented 12 years ago

Would it be sufficient to start with a simple copy() command on the compound for the split? You could then manually add / remove stuff from each compound.

For reactions, compoundSets and media, I'm not sure what base-level the correct behavior would be for a copy() call. It would not be difficult to create new reactions, adding the compound to the compoundSet and creating new media conditions. However, I think these things should probably be handled independently--e.g. with specific functions for each of these areas.

Does this make sense? I'm trying to think of names for the "copyReactions", "copyMedia", "updateCompoundSets" functions...and how these would look.

samseaver commented 12 years ago

The copy() then manual assignment strategy works. There's going to be very few people who'll do this, and as time goes by, we'll be doing it less and less. How would I manually alter the two compounds, by printing them, editing them, and re-loading them? I like this, because I can keep the edited files in my history.

Using a split will invariably mean separating a lesser seen compound from a more common compound (some stereoisomers occur less than others). In this way, I would keep the original id for the more common compound. In turn, this means that the current set of reactons, compoundSets and media would stay the same. If, after the split, the lesser used compound does belong to its own reaction/compoundSet/media, then they too would have to be copied and altered.

Perhaps a generic copy() command for any biochemistry database object can be used?

It would be up to me as a biochemistry curator to be sure I don't miss anything, so if I do a copy() on the compound, I'd like the copy() function to print out to file the list of reactions/compoundSets/media that I would have to consider.