Open pranasag opened 1 year ago
@pranasag thanks, please go ahead to fix
ChEBI-based metabolite name curation is good idea, would you like to try it?
@haowang-bioinfo I'll give it a thought, will come back at some point soon.
@haowang-bioinfo I'll give it a thought, will come back at some point soon.
very good!
Nice!
Since I submitted a fix to the single issue of MAM02053 today, I would like to wake up the discussion on ChEBI-based curation.
It seems that acquiring the names for model metabolites with ChEBI identifiers is really easy using libChEBI. I have briefly looked at the potential mismatches, and there are quite a few. Many of them are minor differences (e.g. "18-(R)-" vs "(18R)-"), but we're talking about a couple of hundred entries (can attach a the Jupyter notebook and output csv
btw). @haowang-bioinfo what are your suggestions on how to proceed?
we're talking about a couple of hundred entries (can attach a the Jupyter notebook and output csv btw).
I think this is a very good idea, go ahead please
I have attached the output of the ChEBI names I've parsed (based on the metabolites.tsv
file in the model
folder) metabolitesWithChEBInames.csv and the sheet with names which do not match in Human-GEM chebiHumanGEMdiff.xlsx. I have scrolled through the list (and it's quite big), and to be honest, in many cases I'd keep the present name from the model (e.g. O2
vs dioxygen
).
I should also note many of the "pool" metabolites have ChEBI identifiers assigned, they do pop up quite frequently in the attached Excel file. I'm not a big fan of this outcome, as pool metabolites are fiction to simplify modeling for us, and not real (bio)chemical entities.
What should we do next?
Very good - will come back to you after checking out
We may also need to adjust the ChEBI IDs in the model.
We may also need to adjust the ChEBI IDs in the model
yes, just do when needed
I have attached the output of the ChEBI names I've parsed (based on the metabolites.tsv file in the model folder) metabolitesWithChEBInames.csv and the sheet with names which do not match in Human-GEM chebiHumanGEMdiff.xlsx.
@pranasag great work, and this is toward the right direction
What should we do next?
how about this:
csv
file to "~/data/modelCuration/";@pranasag it would be great if instead of adding the binary Excel file (.xlsx) to the repository a TSV file would be used. A good place for this file would be /data/modelCuration
.
@mihai-sysbio thanks for the tip!
Current behavior:
Metabolite MAM02053 is called
henicosanoic acid
, while the correct name is heneicosanoic acid. The CoA-bound form MAM02052 is named correctly.For now I can fix this single typo, of course, but is some sort of curation (e.g. based on ChEBI information) for metabolite names planned in the future? This one came out by accident, and I guess there might more issues like this.