cdanielmachado / carveme

CarveMe: genome-scale metabolic model reconstruction
Other
158 stars 52 forks source link

Construction universal model CM #20

Closed acenieuw closed 6 years ago

acenieuw commented 6 years ago

Hi Daniel,

As mentioned before, I am currently in the process of comparing pipelines to construct metabolic models. In order to perform a fair benchmark, I want to export the universal model of CarveMe and add GPRs to it.

I have two questions regarding the CarveMe universal model construction: which models are considered and which genes go into the universal model? In the paper it is stated that reactions and metabolites associated with non-bacterial compartments are removed. However, if I look in the notesfield of reactions and metabolites, it does mention yeast and other non-bacterial models.

The second question is how the BiGG_GPRS file and the BiGG fasta file are linked? For example gene 1503_AT1 from RECON1 is in the GPRS file but I cannot find it in the fasta file. Same goes for ecoli gene E2348C_RS01240. Could you elaborate on this?

Thanks in advance!

cdanielmachado commented 6 years ago

Hi @acenieuw

Regarding your first question. We remove metabolites and reactions that are exclusively eukaryotic. So it is normal that many metabolites in the bacterial universe are also in eukaryotic models (like ATP for example).

Regarding the second question, the BiGG GPRs file was generated by extracting GPR associations from all models in BiGG (version 1.3).

The BiGG fasta file was generated by extracting the AA sequences directly from NCBI.

Note that BiGG (version 1.5) now includes the DNA and AA sequences in the gene annotations, but we have not yet updated our universe building script to account for this. When extracting the sequences we skipped some of the eukaryotic models (like Recon1), which is why you don't find Recon1 genes in the fasta file. There were also a few sporadic cases where the download of a particular gene sequence failed, which would explain why the E2348C_RS01240 gene from the iE2348C_1286 is missing.

acenieuw commented 6 years ago

Thanks Daniel for replying so quickly.

"We remove metabolites and reactions..."

Which models do you exclude precisely i.e. which models do you use to create the universal model? Or how do you determine if a reaction or metabolite is exclusively eukaryotic? As mentioned, I would like to add GPRs to all reactions in the universal model but in order to do this I need to know which models you consider. Would it be possible to provide me a list with the models you have used?

cdanielmachado commented 6 years ago

The universal model is not built from any particular models.

Our universe is initially built by downloading the universal list of BiGG reactions (http://bigg.ucsd.edu/api/v2/universal/reactions). The JSON structure you see here is used to create the SBML file.

Then comes the curation phase where the "draft universe" is used to create a "curated bacterial universe". This is when we remove purely eukaryotic reactions/metabolites from the bacterial model. This is done by looking into the reactions/metabolites annotations field (that contains references to the models where the reaction/metabolite participates) and then mapping to this list:

https://github.com/cdanielmachado/carveme/blob/master/carveme/data/input/bigg_models.csv