SBRG / bigg_models

The BiGG Models website server
http://bigg.ucsd.edu
Other
75 stars 18 forks source link

Universal tables #9

Closed zakandrewking closed 9 years ago

zakandrewking commented 9 years ago
steve-federowicz commented 9 years ago

Hey so was just talking with justin about this and ran into an issue.

The first is that most models don't have associated KEGG IDs. Ideally they all would and maybe I am wrong but pretty sure only 10-20% of what we load will have them.

Second is that we probably have to have our own non-external unique ids for universal components because there are going to be many more types of components than just metabolites. The above scheme would work but it would have a massive number of flagged rows.

zakandrewking commented 9 years ago

I can't check on this right now, but I remember going into Simpheny and seeing KEGG ids for almost every metabolite I checked. Definitely the central metabolic ones. I wonder if they never made it into GRMIT?

It's OK to have a massive number of flagged rows. We have to deal with this someday, and there are many automated approaches to consider. Andreas has done something very similar.

For non-metabolites, we should try to come up with external IDs where possible. For anything that's part of a template reaction, we can get fancy. For instance, a transcription elongation reaction could be linked to the reaction template AND to an external gene ID. But we don't have to solve that immediately.

pillmill commented 9 years ago

The Kegg and Cas IDs were imported from Simpheny.

On Sun, Sep 21, 2014 at 7:33 PM, Zachary King notifications@github.com wrote:

I can't check on this right now, but I remember going into Simpheny and seeing KEGG ids for almost every metabolite I checked. Definitely the central metabolic ones. I wonder if they never made it into GRMIT?

It's OK to have a massive number of flagged rows. We have to deal with this someday, and there are many automated approaches to consider. Andreas has done something very similar.

For non-metabolites, we should try to come up with external IDs where possible. For anything that's part of a template reaction, we can get fancy. For instance, a transcription elongation reaction could be linked to the reaction template AND to an external gene ID. But we don't have to solve that immediately.

— Reply to this email directly or view it on GitHub https://github.com/SBRG/BIGG2/issues/9#issuecomment-56323124.

draeger commented 9 years ago

Am 09/05/14 um 19:33 schrieb Zachary King:

I can't check on this right now, but I remember going into Simpheny and seeing KEGG ids for almost every metabolite I checked. Definitely the central metabolic ones. I wonder if they never made it into GRMIT?

It's OK to have a massive number of flagged rows. We have to deal with this someday, and there are many automated approaches to consider. Andreas has done something very similar.

For non-metabolites, we should try to come up with external IDs where possible. For anything that's part of a template reaction, we can get fancy. For instance, a transcription elongation reaction could be linked to the reaction template AND to an external gene ID. But we don't have to solve that immediately.

— Reply to this email directly or view it on GitHub https://github.com/SBRG/BIGG2/issues/9#issuecomment-56323124.

Hi guys,

IMHO, having an own ID schema in addition to providing references to external KEGG IDs would be a great idea. If BiGG ids would be consistent and unique, other users could refer to us instead of pointing to KEGG etc. It would be very nice if many models would contain references to BiGG, ultimately increasing our access count when people look those up.

Cheers Andreas

Dr. Andreas Draeger University of California, San Diego, La Jolla, CA 92093-0412, USA Bioengineering Dept., Systems Biology Research Group, Office #2506 Phone: +1-858-534-9717, Fax: +1-858-822-3120, twitter: @dr_drae

steve-federowicz commented 9 years ago

Ok, so how does this sound as a temporary solution.

  1. We start by loading models that we know came from simpheny and have KEGG ids
    • This ensures that most of the primary metabolic universal components will have metabolite entries that contain a valid KEGG id
  2. Since we already have a column for KEGG id in the metabolite table then a simple select * from metabolite where kegg_id is null; should do the trick in keeping track of metabolites which need curation.

What do you think?

Actually in re-reading this is essentially exactly what you originally proposed??

zakandrewking commented 9 years ago

Bingo :8ball: (don't read into that)

jslu9 commented 9 years ago

Yeah, right now I made it so that the universal metabolite will update its kegg id if it has a missing kegg id and another metabolite with the same name and has a kegg id is uploaded into the database.

On Tue, Sep 23, 2014 at 5:53 PM, Zachary King notifications@github.com wrote:

Bingo [image: :8ball:](don't read into that)

— Reply to this email directly or view it on GitHub https://github.com/SBRG/BIGG2/issues/9#issuecomment-56611743.

zakandrewking commented 9 years ago
jslu9 commented 9 years ago

Just talked to John to try to put kegg ids into his new updated sbml models and he said that personally doesn't think that kegg ids are good for universal ids. He mentioned that metanetx is better.

steve-federowicz commented 9 years ago

Hmmmm ok, I would be fine with metanetx. I was always pretty impressed with their atom mapping work. I'm not sure if it has been fully implemented within metanetx yet but I think it will be and at that point I think it will be a pretty dominantly sophisticated resource. I think the downside is that KEGG has a lot of visibility outside of constraint-based sysbio and if we go with metanetx ids we are potentially losing some visibility. However, the upside is that costass and metanetx aren't going anywhere and will only continue to get better. Costas is also a friendly lab and so if an official collaboration needed to happen or larger things were to move forward then it would likely be a good situation.

Sent from my iPhone

On Oct 17, 2014, at 11:16 AM, Justin Lu notifications@github.com wrote:

Just talked to John to try to put kegg ids into his new updated sbml models and he said that personally doesn't think that kegg ids are good for universal ids. He mentioned that metanetx is better.

— Reply to this email directly or view it on GitHub.

zakandrewking commented 9 years ago

We aren't technically using KEGG as universal ids: We are using KEGG to generate universal BIGG ids. So we don't have to limit ourselves to one type of external reference ID. It's worth thinking about this more.

Jon already has metanetx ids for his models?

On Fri, Oct 17, 2014 at 3:37 PM, Steve Federowicz notifications@github.com wrote:

Hmmmm ok, I would be fine with metanetx. I was always pretty impressed with their atom mapping work. I'm not sure if it has been fully implemented within metanetx yet but I think it will be and at that point I think it will be a pretty dominantly sophisticated resource. I think the downside is that KEGG has a lot of visibility outside of constraint-based sysbio and if we go with metanetx ids we are potentially losing some visibility. However, the upside is that costass and metanetx aren't going anywhere and will only continue to get better. Costas is also a friendly lab and so if an official collaboration needed to happen or larger things were to move forward then it would likely be a good situation.

Sent from my iPhone

On Oct 17, 2014, at 11:16 AM, Justin Lu notifications@github.com wrote:

Just talked to John to try to put kegg ids into his new updated sbml models and he said that personally doesn't think that kegg ids are good for universal ids. He mentioned that metanetx is better.

— Reply to this email directly or view it on GitHub.

— Reply to this email directly or view it on GitHub https://github.com/SBRG/BIGG2/issues/9#issuecomment-59585434.

jslu9 commented 9 years ago

I don't think so actually. He might have some but it's not in his sbmls for certain. But today I discussed with Jon on pulling out the kegg ids and cas numbers and then putting them into his cobrapy objects. He'll be sending his new models (w/ kegg ids) once he's done updating his python script.

On Fri, Oct 17, 2014 at 3:48 PM, Zachary King notifications@github.com wrote:

We aren't technically using KEGG as universal ids: We are using KEGG to generate universal BIGG ids. So we don't have to limit ourselves to one type of external reference ID. It's worth thinking about this more.

Jon already has metanetx ids for his models?

On Fri, Oct 17, 2014 at 3:37 PM, Steve Federowicz < notifications@github.com> wrote:

Hmmmm ok, I would be fine with metanetx. I was always pretty impressed with their atom mapping work. I'm not sure if it has been fully implemented within metanetx yet but I think it will be and at that point I think it will be a pretty dominantly sophisticated resource. I think the downside is that KEGG has a lot of visibility outside of constraint-based sysbio and if we go with metanetx ids we are potentially losing some visibility. However, the upside is that costass and metanetx aren't going anywhere and will only continue to get better. Costas is also a friendly lab and so if an official collaboration needed to happen or larger things were to move forward then it would likely be a good situation.

Sent from my iPhone

On Oct 17, 2014, at 11:16 AM, Justin Lu notifications@github.com wrote:

Just talked to John to try to put kegg ids into his new updated sbml models and he said that personally doesn't think that kegg ids are good for universal ids. He mentioned that metanetx is better.

— Reply to this email directly or view it on GitHub.

— Reply to this email directly or view it on GitHub https://github.com/SBRG/BIGG2/issues/9#issuecomment-59585434.

— Reply to this email directly or view it on GitHub https://github.com/SBRG/BIGG2/issues/9#issuecomment-59586333.

draeger commented 9 years ago

Am 17.10.14 um 19:35 schrieb Justin Lu:

I don't think so actually. He might have some but it's not in his sbmls for certain. But today I discussed with Jon on pulling out the kegg ids and cas numbers and then putting them into his cobrapy objects. He'll be sending his new models (w/ kegg ids) once he's done updating his python script.

Let's talk about all this on Tuesday during code talk. I think this is very important and deserves a few words of direct discussion.

Dr. Andreas Draeger University of California, San Diego, La Jolla, CA 92093-0412, USA Bioengineering Dept., Systems Biology Research Group, Office #2506 Phone: +1-858-534-9717, Fax: +1-858-822-3120, twitter: @dr_drae