SysBioChalmers / Human-GEM

The generic genome-scale metabolic model of Homo sapiens
https://sysbiochalmers.github.io/Human-GEM-guide/
Creative Commons Attribution 4.0 International
98 stars 41 forks source link

Typo in the name of MAM02053 #491

Open pranasag opened 1 year ago

pranasag commented 1 year ago

Current behavior:

Metabolite MAM02053 is called henicosanoic acid, while the correct name is heneicosanoic acid. The CoA-bound form MAM02052 is named correctly.

For now I can fix this single typo, of course, but is some sort of curation (e.g. based on ChEBI information) for metabolite names planned in the future? This one came out by accident, and I guess there might more issues like this.

haowang-bioinfo commented 1 year ago

@pranasag thanks, please go ahead to fix

ChEBI-based metabolite name curation is good idea, would you like to try it?

pranasag commented 1 year ago

@haowang-bioinfo I'll give it a thought, will come back at some point soon.

haowang-bioinfo commented 1 year ago

@haowang-bioinfo I'll give it a thought, will come back at some point soon.

very good!

feiranl commented 1 year ago

Nice!

pranasag commented 1 year ago

Since I submitted a fix to the single issue of MAM02053 today, I would like to wake up the discussion on ChEBI-based curation.

It seems that acquiring the names for model metabolites with ChEBI identifiers is really easy using libChEBI. I have briefly looked at the potential mismatches, and there are quite a few. Many of them are minor differences (e.g. "18-(R)-" vs "(18R)-"), but we're talking about a couple of hundred entries (can attach a the Jupyter notebook and output csv btw). @haowang-bioinfo what are your suggestions on how to proceed?

haowang-bioinfo commented 1 year ago

we're talking about a couple of hundred entries (can attach a the Jupyter notebook and output csv btw).

I think this is a very good idea, go ahead please

pranasag commented 1 year ago

I have attached the output of the ChEBI names I've parsed (based on the metabolites.tsv file in the model folder) metabolitesWithChEBInames.csv and the sheet with names which do not match in Human-GEM chebiHumanGEMdiff.xlsx. I have scrolled through the list (and it's quite big), and to be honest, in many cases I'd keep the present name from the model (e.g. O2 vs dioxygen).

I should also note many of the "pool" metabolites have ChEBI identifiers assigned, they do pop up quite frequently in the attached Excel file. I'm not a big fan of this outcome, as pool metabolites are fiction to simplify modeling for us, and not real (bio)chemical entities.

What should we do next?

haowang-bioinfo commented 1 year ago

Very good - will come back to you after checking out

feiranl commented 1 year ago

We may also need to adjust the ChEBI IDs in the model.

haowang-bioinfo commented 1 year ago

We may also need to adjust the ChEBI IDs in the model

yes, just do when needed

haowang-bioinfo commented 1 year ago

I have attached the output of the ChEBI names I've parsed (based on the metabolites.tsv file in the model folder) metabolitesWithChEBInames.csv and the sheet with names which do not match in Human-GEM chebiHumanGEMdiff.xlsx.

@pranasag great work, and this is toward the right direction

What should we do next?

how about this:

  1. start a new branch, upload the extracted csv file to "~/data/modelCuration/";
  2. update model by changing met names that should be changed in your opinion, i.e. do the certain ones first
mihai-sysbio commented 1 year ago

@pranasag it would be great if instead of adding the binary Excel file (.xlsx) to the repository a TSV file would be used. A good place for this file would be /data/modelCuration.

pranasag commented 1 year ago

@mihai-sysbio thanks for the tip!