Closed wshao1 closed 10 months ago
The problem is that ecHumanGEM
should not have multiple protein usage reactions for the same protein (e.g. usage_prot_Q0WX57
). This should normally have been prevented by #349, but ecHumanGEM uses the uniprotConversion.tsv
that might still result in duplicate proteins.
@wshao1 How do you generate ecHumanGEM
? With the following code a draft model is generated, but the duplicate usage protein reaction cannot be found:
adapterLocation = fullfile(findGECKOroot,'tutorials','light_ecModel','HumanGEMAdapter.m');
adapter = ModelAdapterManager.setDefault(adapterLocation);
model = loadConventionalGEM();
[ecModel, noUniprot] = makeEcModel(model,false);
find(strcmp(ecModel.rxns,'usage_prot_Q0WX57'))
ans =
0×1 empty double column vector
The only differences between my adapter and the one provided in the light_ecModel
tutorial:
HumanGEMAdapter.m
):% The model distributed with the light_ecModel tutorial is Human-GEM
% is version 1.15.0, available from
% https://github.com/SysBioChalmers/Human-GEM/releases/tag/v1.3.0
% In addition, the following lines were run to reduce its size
% before storing it in this GECKO tutorial:
% ihuman = simplifyModel(ihuman,false,false,true,true);
% [ihuman.grRules,skipped] = simplifyGrRules(ihuman.grRules,true);
% ihuman = deleteUnusedGenes(ihuman);
% ihuman = rmfield(ihuman,{'unconstrained','rxnReferences','rxnFrom','metFrom','rxnConfidenceScores'});
% ihuman.name = ihuman.id;
% writeYAMLmodel(ihuman,'human-GEM.yml')
It is unclear to me why this simplification is required, and I could not find the simplifyGrRules
function in the RAVEN toolbox repository to figure out what was actually happening, so I skipped this step. I just tried running it right now, and it seems like the function cannot be found: Unrecognized function or variable 'simplifyGrRules'.
I can replicate this with Human-GEM v1.16. The problem is with reaction MAR09490 whose gene association contains multiple isoenzymes that are all annotated to the same UniProt ID.
As a work-around, the duplicated usage_prot_XXXXXX
reactions can safely be deleted. The following code keeps only one instance of the duplicated reactions:
% Make a list of unique reactions
[~,uniqueRxns] = unique(ecModel.rxns);
duplRxns = ~ismember(1:numel(ecModel.rxns), uniqueRxns);
% Remove from all reaction-related model fields
ecModel.rxns(duplRxns) = [];
ecModel.rxnNames(duplRxns) = [];
ecModel.S(:,duplRxns) = [];
ecModel.lb(duplRxns) = [];
ecModel.ub(duplRxns) = [];
ecModel.rev(duplRxns) = [];
ecModel.c(duplRxns) = [];
ecModel.grRules(duplRxns) = [];
ecModel.rxnGeneMat(duplRxns,:) = [];
ecModel.rxnFrom(duplRxns) = [];
The above code should work, but I haven't tested the resulting model with getSubsetEcModel
as how you use it.
In the meanwhile, we'll work on changes to makeEcModel
to prevent this from happening. It is not super trivial how to fix this, as it also means that there are duplicate entries in ecModel.ec.enzymes
(while their corresponding ecModel.ec.genes
and ecModel.ec.rxnEnzMat
entries are unique). It is therefore probably best to allow for duplicate enzymes (but not genes) in ecModel.ec
, but this means it needs to be checked that this does not cause any issues when e.g. running selectKcatValue
or constrainEnzConcs
.
Regarding simplifyGrRules
, this function is from Human-GEM and it's useful to ensure that the grRules are defined in an expanded format so makeEcModel
can correctly split reactions by isoenzymes (done by RAVEN's expandModel
). RAVEN's standardizeGrRules
is also able to do this. We might want to reconsider how makeEcModel
handles this as well. However, this is unrelated to the problem encountered in this Issue.
Thanks @edkerk, removing the duplicated reactions allows getSubsetEcModel
to run.
The resulting model shows zero growth rate however.
Growth rate in HCT116-GEM: 86.6224865446 /hour.
Growth rate in ecHuman-GEM: 0.0871287888 /hour.
Growth rate in ecHCT116-GEM: 0.0000000000 /hour.
I have also tried extracting an ecHT29 using the HT29-GEM from the light_ecModel
tutorial, but that also yields zero growth rate.
Growth rate in HT29-GEM: 149.9672040992 /hour.
Growth rate in ecHuman-GEM: 0.0871287888 /hour.
Growth rate in ecHT29-GEM: 0.0000000000 /hour.
Integrating proteomics data and then flexibilizing enzyme concentrations results in the following error, suggesting getSubsetEcModel may have returned a non-functional model.
No (more) limiting enzymes have been found. Attempting to increase protein pool exchange...
Protein pool exchange was also not limiting. Inability to reach growth rate is not related to
enzyme constraints. Maximum growth rate is 0.
I will investigate this further.
The context-specific ecModel failing to simulate is likely the result of the large ecModel and the context-specific non-ec GEM that are used as input by getSubsetEcModel
are not derived from the same starting GEM. The latter is essential, as only then can a reliable subset of reactions be selected.
This is now clarified in the function documentation, and getSubsetEcModel
tries to prevent such cases by at least checking that all the reactions from the context-specific non-ec GEM can be found in the large ecModel. This is not fool-proof, as other curations could have been done while keeping the same reaction identifiers, but a full check on whether both models have derived from the same starting GEM is outside of the scope.
Description of the bug:
I have an ecHumanGEM created using the full_ecModel protocol, which I want to reduce to a context-specific GEM using
getSubsetEcModel
. However, this returns an error due to the ecHumanGEM having multiple protein usage reactions with the same ID (i.e. 'usage_prot_Q0WX57'). This results in the following traceback:Reproducing these results:
System information
I hereby confirm that:
main
branch of the repository.