ModelSEED / ModelSEEDDatabase

This repository contains the definitive copy of the biochemistry and metadata used to construct models using the ModelSEED/ProbAnno approach
Other
55 stars 38 forks source link

Question on compounds and reactions master files #2

Closed mmundy42 closed 9 years ago

mmundy42 commented 9 years ago

@samseaver, I'm confused on the compounds.master.tsv and reactions.master.tsv files in the Biochemistry directory. I ran the Print_Master_Reactions.pl script and it generated a Master_Reaction_List.tsv file. When I compare that file with reactions.master.tsv they are not the same.

It looks like the script is getting the default and plantdefault biochemistry by downloading from a KBase workspace and also processing the reactions.default.tsv and reactions.plantdefault.tsv files and applying the modifications from reactions.master.mods to generate Master_Reaction_List.tsv.

I haven't tried the Print_Master_Compounds.pl script but it looks similar.

Can you explain the roles of the various files for me.

samseaver commented 9 years ago

In theory, the PrintMaster.pl scripts should be printing directly to the respective master.tsv files, but when I first created them, for testing purposes, I had them print to an intermediate Master_List.tsv file, which I then copy to the respective _master.tsv file. However, this was made pretty much redundant after the introduction of the Update_Reaction_Status.pl script. Because compound modifications are now made within reactions, when printing the master reaction file, the reaction status has to be updated a priori.

So, somewhat redundantly, I copy Master_Reaction_List.tsv to reactions.master.tsv then run Update_Reaction_Status.pl then use git diff to compare differences.

You shouldn't be detecting differences (unless you change compounds.master.mods) to the compounds.master.tsv and Master_Compound_List.txt files.

samseaver commented 9 years ago

The process has been made simpler by https://github.com/ModelSEED/ModelSEEDDatabase/commit/d3cf2da70506f25056f02a7d9ff86c50c25c7a15

but still one has to run Print_Master_Reactions.pl and Update_Reaction_Status.pl in order.

mmundy42 commented 9 years ago

I ran Print_Master_Compounds_List.pl, then Print_Master_Reactions_List.pl, then Update_Reaction_Status.pl failed with this error:

> ./Update_Reaction_Status.pl 
Could not find water in biochemistry! at /data2/microbiome/kbase/prod-20150521/deployment/lib/Bio/KBase/ObjectAPI/utilities.pm line 202, <FH> line 2.
Bio::KBase::ObjectAPI::utilities::error("Could not find water in biochemistry!") called at /data2/microbiome/kbase/prod20150521/deployment/lib/Bio/KBase/ObjectAPI/KBaseBiochem/Reaction.pm line 290
Bio::KBase::ObjectAPI::KBaseBiochem::Reaction::createEquation(Bio::KBase::ObjectAPI::KBaseBiochem::Reaction=HASH(0xc668f78), HASH(0xc668e58)) called at /data2/microbiome/kbase/prod-20150521/deployment/lib/Bio/KBase/ObjectAPI/KBaseBiochem/Reaction.pm line 58
Bio::KBase::ObjectAPI::KBaseBiochem::Reaction::_buildgenequation(Bio::KBase::ObjectAPI::KBaseBiochem::Reaction=HASH(0xc668f78)) called at accessor     Bio::KBase::ObjectAPI::KBaseBiochem::Reaction::genEquation (defined at /data2/microbiome/kbase/prod-20150521/deployment/lib/Bio/KBase/ObjectAPI/KBaseBiochem/Reaction.pm line 21) line 10
Bio::KBase::ObjectAPI::KBaseBiochem::Reaction::genEquation(Bio::KBase::ObjectAPI::KBaseBiochem::Reaction=HASH(0xc668f78)) called at ./Update_Reaction_Status.pl line 83

Is there another script I need to run?

samseaver commented 9 years ago

What repo/fork/branch are you using? This should work if you use my fork of the KBaseFBAModeling repo. In any case, I'll work on merging the changes with both the KBase and ProbModelSEED repositories.

mmundy42 commented 9 years ago

Another question, why does Print_Master_Reactions_List.pl need to pull the two biochemistry databases from the kbase workspace? Shouldn't there be enough information in the reactions.default.tsv and reactions.plantdefault.tsv files?

samseaver commented 9 years ago

I don't like I have to do that, but the one time when I need the biochemistry object is when I need to replace an actual compound in a reaction. As it happens, I spent some time trying to make it work by simply replacing the compound_ref of the reagent itself, which would have made it so that I didn't need the biochemistry objects, but for the life of me, I could not get it to work.

I think that part of the underlying problem is that even if you update the compound_ref(), the biochemistry is cached and won't recognize the compound_ref() unless you "save" the object or call a particular function to update the cache. The end result would be that calling reagent()->compound() would give the old compound whilst calling reagent()->compound_ref() would give the new compound. I could not figure this out. This was important because, in the same script, I want to call the reaction definition and code etc. and it will still print out the wrong/old compound.

It just so happened that I discovered that if you instead were to update the reagent by calling compound() directly (and feeding in the correct compound object which of course needs to be retrieved from the biochemistry object) then that would work.

If I could figure out how I can get the underlying reagent object to respond to an updated compound_ref() without having to call the biochemistry object, I would do so, and avoid loading the objects in the first place. Would you be able to have a look at this problem?

That said, of course, this is still dependent on my use of KBaseFBAModeling.

mmundy42 commented 9 years ago

Taking a look at this. I'm going to try building a new ModelSEED biochemistry object from the source data to remove the dependency on KBaseFBAModeling. We'll need the new biochemistry soon to start testing it on models.

cshenry commented 9 years ago

I think all these scripts need to be able to function depending ONLY on the master set of flat files. I think the answer is to write a bit of code to permit the construction of a biochemistry object from our flat files� I�ll think on this. It would be useful in other scenarios� and the PATRICStore has some concepts we could probably use for this (e.g. transformation).

On Jul 10, 2015, at 12:34 PM, Mike Mundy notifications@github.com wrote:

Taking a look at this. I'm going to try building a new ModelSEED biochemistry object from the source data to remove the dependency on KBaseFBAModeling. We'll need the new biochemistry soon to start testing it on models.

� Reply to this email directly or view it on GitHub.

mmundy42 commented 9 years ago

Yes, currently writing the "bit of code" (well, maybe more than a bit :-)

mmundy42 commented 9 years ago

Confirmed the pull request in ProbModelSEED fixes the Update_Reaction_Status.pl script (after a one line change to a use statement).

I would still like to document the process for making updates to the compounds and reactions files. Are the _.master.tsv files always generated by the Print_Master___List scripts? Maybe it would help to use updating Hg in the compounds file as an example.

samseaver commented 9 years ago

Yes, quite simply, in this case you'd add a line to compounds.master.mods:

plantdefault formula Hg Run: ./Print_Master_Compounds_List.pl ./Print_Master_Reactions_List.pl ./Update_Reaction_Status.pl If you were to create a page somewhere for the purposes of documenting this, I can fill it in with what fields you should use in the mods files.
mmundy42 commented 9 years ago

I started documenting in Biochemistry/README.md. Would be good to have you review and make any corrections.

cshenry commented 9 years ago

Hey Mike,

Trying to run the perl script to test my work, but it won�t run. I keep getting this: ms-probanno Traceback (most recent call last): File "/Users/chenry/code/PATRICClient/bin/../pybin/ms-probanno.py", line 8, in from biop3.ProbModelSEED.ProbAnnotationWorker import ProbAnnotationWorker ImportError: No module named biop3.ProbModelSEED.ProbAnnotationWorker

So I tried running the worker with python. I got this: python ProbAnnotationWorker.py Traceback (most recent call last): File "ProbAnnotationWorker.py", line 11, in from biop3.ProbModelSEED.ProbAnnotationParser import ProbAnnotationParser ImportError: No module named biop3.ProbModelSEED.ProbAnnotationParser

So I tried running the parser: python ProbAnnotationParser.py Traceback (most recent call last): File "ProbAnnotationParser.py", line 10, in from shock import Client as ShockClient ImportError: No module named shock

And this is where I�m stuck. I have Shock.py from probanno. Any ideas?

Chris

On Jul 14, 2015, at 6:54 PM, Mike Mundy notifications@github.com wrote:

I started documenting in Biochemistry/README.md. Would be good to have you review and make any corrections.

� Reply to this email directly or view it on GitHub.

mmundy42 commented 9 years ago

There should be a shock.py in the deployment/lib directory. And you need to make sure shock.py is the latest version as there were changes in later versions of Shock. Unfortunately there isn't a version number in the file. The shock.py on my system looks like this:

-rw-rw-r-- 1 m097749 microbiome 10684 May 21 14:13 shock.py

I can send you a copy if you don't have it on your system.

After running user-env.sh, the PYTHONPATH variable should be set like this:

$KB_TOP/lib

where $KB_TOP is set to the deployment directory. And then the import should work.

mmundy42 commented 9 years ago

@samseaver, how were you thinking of using the is_obsolete flag in the master reaction file? There currently is no equivalent in the Biochemistry or Reaction objects. When building a Biochemistry object, should reactions marked as obsolete not be added?

Also looking at the Print_Master_Reaction_List script. To build the Rxns_Codes() and Codes_Rxns() hashes, a Reaction object is needed to generate the equation code. And a Biochemistry object is required since a Reaction object cannot operate independently. The Rxns_Codes() and Codes_Rxns() hashes are used to find obsolete reactions. But if we don't have a way to handle obsolete reactions why bother finding them.

Maybe I'm going a little crazy with "chicken and egg" problems i.e. to create the Biochemistry object from the master reaction file you need to create a Biochemistry object.

samseaver commented 9 years ago

So, in theory, we don't want to be using the obsolete compounds and reactions in our biochemistry, so they wouldn't be loaded as such. However, as the merger came from the two old databases, we have to check to see whether this means that reactions which would otherwise be present in any of the ModelTemplates would then be missing, which would break the model reconstruction process.

We also need to make a decision about the legacy process, are we keeping the old biochemistry objects around for the models that have already been generated in KBase and ModelSEED?

mmundy42 commented 9 years ago

For the legacy process, it would make sense to me to have a version assigned to the objects. The version could either be a number or a date. For example, "master-v1.0.0" or "master-2015a" for the new master biochemistry.

mmundy42 commented 9 years ago

@cshenry and I discussed this morning. The plan is for the biochemistry to be the source for a model template and a model template to be the source for a model. A model will no longer reference a biochemistry object. For ModelSEED, there will be no need for biochemistry object to be available in the workspace for modeling. For existing models, the old biochemistry objects can be kept since that won't impact new models.