ModelSEED / ModelSEED

Other
3 stars 2 forks source link

Proposal for storing structures in a compound #23

Open samseaver opened 11 years ago

samseaver commented 11 years ago

There's three slightly different molecular structures that any one compound could have. I'm also aware that there is an intention of having different compound uuids to represent various states of the compound at different pHs, but this issue is an attempt to separate out different ways in which we can implement this.

My priority at this point in time is that I have three different stringcodes: 1) The full untouched InChI string converted from the original mol file (probably hardly ever used, but must be there for reference puproses) 2) The 'search' InChI string, where, in order to reliably match compounds, the charge layers are stripped. 3) The 'charged' InChI string which was converted at pH=7, (only really affects the charge layer) which is used for reaction balancing.

I was able to print the JSON of a single compound object, and see that the object can have an array of multiple structures, each of a different 'type'. My proposal right now, which can be challenged, I don't have a problem with that, is to have two structures in the same structures array for a single compound: 1) type:inchi 2) type:inchi_search 2) type:inchi_ph7.

samseaver commented 11 years ago

I forgot to add to this that I will do the same with the mol files. But that brings up another question, how are we actually going to store these, as BLOBs?

cshenry commented 11 years ago

So the original plan was to do exactly this. I anticipated having multiple string-structures like those you mention above. I also thought we might put molfiles into the structure arrays as text blobs. There is one issue we need to consider though. Adding all these structures is going to slow down the biochemistry object, and it's going to make it MASSIVE (particularly if we add the mofiles). I'm a bit weary of doing that because the biochemistry gets replicated alot as you curate it. Scott proposed that we make a separate provenance object called "StructuralBiochemistry", and we put structural cues and molecular structures in that object. This would then be linked to the biochemistry just as the biochemistry is linked to the mapping. The most important thing, is it would separate molecular structure curation from biochemistry curation, which is probably a good thing.