bug: applyKcatConstraints may give NaN in S-matrix of light ecModel

Description of the bug:

The following code (see below) exposes two problems:

We get NaNs in the S matrix
The constraints are not right - the full and light model get totally different growth rates, they should theoretically be the same, although some small differences can be accepted due to uncertainties in the solver.

Reproducing these results:

Run this - the growth rate is way too low for light

adapterLocation = fullfile(findGECKOroot,'tutorials','light_ecModel','HumanGEMAdapter.m');
adapter = ModelAdapterManager.setDefault(adapterLocation); 

model = load('C:/Work/MatlabCode/components/human-GEM/Human-GEM_1_12/Human-GEM/model/Human-GEM.mat').ihuman;

%%%%%%%%%%%%%%%%%%%%%%%%%%%
%Light
%%%%%%%%%%%%%%%%%%%%%%%%%%%
[ecModelLt, noUniprot] = makeEcModel(model,true); %warning is ok
ecModelLt         = getECfromGEM(ecModelLt);
kcatList_fuzzy_lt  = fuzzyKcatMatching(ecModelLt);

ecModelLt = selectKcatValue(ecModelLt,kcatList_fuzzy_lt);
ecModelLt = applyKcatConstraints(ecModelLt);
ecModelLt = setProtPoolSize(ecModelLt);
find(isnan(ecModelLt.S(:,strcmp(ecModelLt.rxns,'MAR08263'))))
%ecModelLt.S(isnan(ecModelLt.S)) = 0;
slt = solveLP(ecModelLt,1)

%%%%%%%%%%%%%%%%%%%%%%%%%%%
%Full
%%%%%%%%%%%%%%%%%%%%%%%%%%%

[ecModelFull, noUniprot] = makeEcModel(model,false); %warning is ok
ecModelFull         = getECfromGEM(ecModelFull);
kcatList_fuzzy_full  = fuzzyKcatMatching(ecModelFull);

ecModelFull = selectKcatValue(ecModelFull,kcatList_fuzzy_full);
ecModelFull = applyKcatConstraints(ecModelFull);
ecModelFull = setProtPoolSize(ecModelFull);
%ecModelFull.S(isnan(ecModelFull.S)) = 0;
sfull = solveLP(ecModelFull,1)

If I replace the following code with the previous implementation (see below), it works as it should, the growth rates are close to identical.

current (that doesn't work):

    % Map ec-reactions to model.rxns
    [hasEc,~]   = ismember(model.rxns,modRxns);
    [~,rxnIdx]  = ismember(modRxns,model.rxns);
    hasEc = find(hasEc & updateRxns);
    for i = 1:numel(hasEc)
        % Get all isozymes per reaction
        ecIdx = find(rxnIdx == hasEc(i));
        % Multiply enzymes with their MW (they are then automatically
        % summed per reaction), and divide by their kcat, to get a vector
        % of MW/kcat values.
        MWkcat = (model.ec.rxnEnzMat(ecIdx,:) * model.ec.mw) ./ model.ec.kcat(ecIdx);
        % If kcat was zero, MWkcat is Inf. Correct back to zero
        MWkcat(abs(MWkcat)==Inf)=0;
        if isnan(MWkcat)
            apa = 17;
        end
        % Select the lowest MW/kcat (= most efficient), and convert to /hour
        model.S(prot_pool_idx, hasEc(i)) = -min(MWkcat/3600);
    end

old code (that works, but is slower):

    nextIndex = 1;
    for i = 1:numel(model.rxns)
        if nextIndex <= length(model.ec.rxns) && strcmp(model.rxns(i), modRxns(nextIndex))
            updated = false;
            newMWKcats=[];
            %Reactions with isozymes will have several rows in ec.rxns etc.
            %Loop through all reactions, and for each reaction look at all isozymes
            %Then use the lowest MW/kcat among the isozymes - when optimizing, that
            %is the one that will be used anyway.
            while true
                if updateRxns(i)
                    updated = true;
                end
                enzymes = find(model.ec.rxnEnzMat(nextIndex,:));
                if model.ec.kcat(nextIndex) == 0
                    %newMWKcats = [newMWKcats 0]; - if some are missing, just ignore those
                else
                    MW = sum(model.ec.mw(enzymes) .* model.ec.rxnEnzMat(nextIndex,enzymes).'); %multiply MW with number of subunits
                    newMWKcats = [newMWKcats MW/model.ec.kcat(nextIndex)]; %Light model: protein usage is MW/kcat
                end                
                nextIndex = nextIndex + 1;
                if  nextIndex > length(model.ec.rxns) || ~strcmp(model.rxns(i), modRxns(nextIndex))
                    break;
                end
            end
            if (updated)
                if isempty(newMWKcats)
                    newMWKcats = 0; %no cost
                end
                %Light model: always use the "cheapest" isozyme, that is what the 
                %optimization will choose anyway unless isozymes are individually constrained.
                model.S(prot_pool_idx, i) = -min(newMWKcats)/3600; % 3600: per second -> per hour
            end
        end
    end

System information

I hereby confirm that:

[X] My GECKO installation met all requirements.
[X] This bug occurs in the main branch of the repository.
[X] A similar issue does not already exist.

@edkerk would be good if you could take a look at this, I think you introduced this new code? I am very short on time right now. Replacing with the old code could be an intermediate fix that goes pretty quickly to do.

The linked PR solves the issue with NaN.

Regarding the growth rate difference (0.1428 for light and 0.1443 for full), this is irrespective of whether the old or new applyKcatConstraints implementation is used. The unsimilar growth rate is contradicting the comparison that is shown by running tutorials\full_ecModel\code\plotlightVSfull.m.

The difference is that the tutorial compares light and full ecModels of the realistically constrained yeast-GEM, while the code here uses the not-contextualized human-GEM which has many open exchange reactions. While not thoroughly reviewed, I anticipate that these unrealistic constraints on exchange reactions causes the difference between the two model formats.

Yep, difficult to tell if this is just because the accuracy of the solver causes problems somehow, or if there is a bug somewhere in the code. A way to test it may be to experiment with closing all but the minimum exchange reactions and see if it matters. It could also be that there are some very small fluxes somewhere in the model and that this is something that doesn't happen in Yeast-GEM. But would be good to sort it out. The problem could be either in the light or in the full - if it is a solver issue it is more likely in the full, numerical issues were the reason I created the light in the first place. It should be better now since we rescaled things, but it would be good to figure out. We could also just check the magnitude of all fluxes in light and full and see if there are any really small ones, like < 10^-9.

Yep, difficult to tell if this is just because the accuracy of the solver causes problems somehow, or if there is a bug somewhere in the code. A way to test it may be to experiment with closing all but the minimum exchange reactions and see if it matters. It could also be that there are some very small fluxes somewhere in the model and that this is something that doesn't happen in Yeast-GEM. But would be good to sort it out. The problem could be either in the light or in the full - if it is a solver issue it is more likely in the full, numerical issues were the reason I created the light in the first place. It should be better now since we rescaled things, but it would be good to figure out. We could also just check the magnitude of all fluxes in light and full and see if there are any really small ones, like < 10^-9.

I also imagine that it is because of numerical issues, but it is hard to investigate if so many exchange reactions are open. Setting realistic flux constraints is not trivial though. If we have a case to check this it would be good to revisit, in a separate Issue. The current Issue of the NaN values is resolved.

SysBioChalmers / GECKO