SysBioChalmers / ecModels

A container for all enzyme constrained models created by GECKO.
Creative Commons Attribution 4.0 International
10 stars 6 forks source link

Automation of ecModels pipeline #19

Closed IVANDOMENZAIN closed 3 years ago

IVANDOMENZAIN commented 4 years ago

Complete automation of the ecModels pipeline for the following models:

IVANDOMENZAIN commented 4 years ago

This is currently being developed in a fork repository available here.

mihai-sysbio commented 4 years ago

KmxGEM was missing the .xml on master. Now it exists, so let's see how the auto-run pipeline behaves.

iYali has a problem, perhaps my GECKO fork should be updated?

INFO:__main__:
        cd geckomat
        model = load('/home/jenkins/ecModels-dependencies/Yali-GEM/ModelFiles/mat/iYali.mat');
        model = model.model;
        modelname = 'ecYaliGEM';
        [ecModel, ecModel_batch] = enhanceGEM(model,'COBRA', modelname, '4.1.1');
        cd ../models
        save([modelname '/' modelname '.mat']','ecModel');
        save([modelname '/' modelname '_batch.mat'], 'ecModel_batch');
        quit

{ Unable to perform assignment with 0 elements on the right-hand side.

Error in manualModifications>otherChanges (line 425)
         rxn_id   = model.rxns{pos_rxn};

Error in manualModifications (line 113)
model = otherChanges(model);

Error in enhanceGEM (line 37)
[ecModel,modifications] = manualModifications(ecModel);
}

For Human-GEM I think we arrived to some conclusion wrt not providing any tissue GEMs, so there would be only one ecHuman-GEM which would make it fit with the rest of the pipeline, but I might be confused.

IVANDOMENZAIN commented 4 years ago

You're right @mihai-sysbio, for HumanGEM we're just thinking on adding the original general model into the ecModels pipeline.

mihai-sysbio commented 3 years ago

I'm jotting down some ideas to discuss at the next meeting with @IVANDOMENZAIN and @BenjaSanchez :

mihai-sysbio commented 3 years ago

Follow-up:

mihai-sysbio commented 3 years ago

After fixing the issue with merging the content of the scripts folder, the pipeline successfully created eciYali in only about half an hour. However, ecHuman still fails after 2.5h of computation.

IVANDOMENZAIN commented 3 years ago

Great news about iYali. Regarding ecHuman, it is a large model, so it is expected to take longer than the others, specially in the steps of Kcat matching and saveECmodel (conversion to SBML). I can see the MATLAB log provided by GitHub actions but I cannot find any error message.

I can see the last MATLAB line being: prot_abundance file is not available. A default value of f=0.5 is set instead, this means that the pipeline is entering the function getConstrainedModel.m but apparently finding problems there. I will run this locally from this specific branch in order to find what's going on. Thanks for the update @mihai-sysbio.

mihai-sysbio commented 3 years ago

The error is printed just above the Matlab output:

Error using getIndexes (line 70)
Incorrect value of the 'type' parameter. Allowed values are 'rxns', 'mets',
'metnames', 'metcomps' or 'genes'.

Error in changeMedia_batch (line 119)
        metIndx = getIndexes(model,strcat(mediaComps{i},'[s]'),'metscomps');

Error in getConstrainedModel (line 46)
ecModel = changeMedia_batch(ecModel,c_source);

Error in enhanceGEM (line 63)
[ecModel_batch,OptSigma] = getConstrainedModel(ecModel,modifications,name);

Yeah, about that prot_abundance file. From what I could see, this file is deleted only for Human-GEM here. Since this deletion is model specific, I went around it by overwriting that file with an empty file with the same name. I guess GECKO reacts differently to an empty file versus the file not being present?

IVANDOMENZAIN commented 3 years ago

Good! that you're pointing this out @mihai-sysbio, the file prot_abundance.txt should actually be deleted for all those models for which such file is not provided in the scripts folder (human, yarrowia, marxianus).

mihai-sysbio commented 3 years ago

I see. Personally, I would find it simpler to provide an empty file and update a line in measureAbundance to also check for the file size to be non-zero. I'm thinking this would be more transparent to the user that

all files provided in scripts override what is in GECKO

and to avoid having an exception like

if specific files are not provided in scripts they are deleted from GECKO.

IVANDOMENZAIN commented 3 years ago

I see. Personally, I would find it simpler to provide an empty file and update a line in measureAbundance to also check for the file size to be non-zero. I'm thinking this would be more transparent to the user that

all files provided in scripts override what is in GECKO

and to avoid having an exception like

if specific files are not provided in scripts they are deleted from GECKO.

@mihai-sysbio, a problem with this fix would be that several projects that have incorporated proteomics data into ecYeastGEM rely on this function, and this is an actual crucial step for their incorporation. I also like your suggested fix but this will also break backwards compatibility for those other projects. Easy to solve If I were the only person using this for proteomics but now we even have users from other groups relying on this.

IVANDOMENZAIN commented 3 years ago

@mihai-sysbio What about a halfway solution? by adding a second conditional in GECKO/geckomat/limit_proteins/measureAbundance from:

if exist(fileName,'file')~= 0

to:

if exist(fileName,'file')~= 0
    fID         = fopen('../../databases/prot_abundance.txt');
    data        = textscan(fID,'%s','delimiter','\n');
    if ~isempty(data{1})

In this case empty files can also be compatible with this and wouldn't break compatibility for those users relying on scripts that delete the prot_abundance.txt file prior to proteomics integration.

mihai-sysbio commented 3 years ago

That sounds excellent! Let me know what branch of GECKO this is on, so I can use that instead of master. And I'll add an empty prot_abundance.txt to all 4 GEMs (listed at the top of this issue).

IVANDOMENZAIN commented 3 years ago

@mihai-sysbio, the fix has been introduced and tested both for an empty and non-empty prot_abundance.txt file, you can find it here

mihai-sysbio commented 3 years ago

The pipeline managed to create ecHuman. It took about 8.5h, which I find a bit much. I have already begun setting up a new self-hosted runner for this repository, which I hope will speed things up.

mihai-sysbio commented 3 years ago

These are 3 models remaining at the top of this issue:

BenjaSanchez commented 3 years ago

for ecYeast I would need to know which Cobratoolbox version can successfully write (output) the model. 3.1 has been shown to not work.

It turns out that the pipeline was failing with the same error for several versions of yeast-GEM (everything from yeast8.2.0 onwards), due to an addition in the notes field of the COBRA structure. I have addressed this with a temporary fix in https://github.com/SysBioChalmers/GECKO/pull/125, an opened https://github.com/SysBioChalmers/GECKO/issues/126 as a reminder to better address the problem in the future.

I still think we should pin COBRA to avoid future issues like this, until they have a more consistent releasing scheme. In particular, I confirmed the GECKO pipeline is fully working with commit 6c49aaf8d.

mihai-sysbio commented 3 years ago

Excellent! I've changed the COBRA version to that commit, and used the fix/remove-met-notes branch of GECKO. ~However, it still fails to write the model for ecYeast.~

edit: ecYeast now works! see this PR

mihai-sysbio commented 3 years ago

Now that eciSM996 works, what do you think about #29 - are we ready to merge in the MetabolicAtlas/ecModels fork?

mihai-sysbio commented 3 years ago

I'm thinking of closing this issue as is, and opening a new one specifically for the (potential) problems in ecMouseGEM.

IVANDOMENZAIN commented 3 years ago

@mihai-sysbio I agree