hard-coded data file & script locations

SysBioChalmers / GECKO

Toolbox for including enzyme constraints on a genome-scale model.

http://sysbiochalmers.github.io/GECKO/

MIT License

66 stars 51 forks source link

hard-coded data file & script locations #55

Closed edkerk closed 1 year ago

edkerk commented 6 years ago

In measureAbundance, the location of the protemics data is hard-coded. This is inconvenient if one would for instance have proteomics data for many conditions and want to make models for each of these conditions: the user would have to replace the databases/prot_abundance.txt file. Probably better to have genes and abundance as parameters to the function.

BenjaSanchez commented 6 years ago

@edkerk note that measureAbundance is not meant for constraining the model with proteomics, but to read Pax-DB data and estimate the fraction [g/g] of a group of enzymes respect to the total, which is needed to (later) set a constrain on the protein pool for enzymes which don't have proteomic measurement. The function you would want to use for the purpose you describe is constrainEnzymes, which accepts inputs of pIDs and data as you suggest.

That being said, I agree with the problem of hard-coded locations (e.g. of prot_abundance.txt), and we will change this at the toolbox level. For this the idea would be to add /geckomat* and /databases* paths as a requirement for using GECKO, that way we can avoid any relative path and use those functions/data from any other folder. Let us know if you have any thoughts on this proposal.

edkerk commented 6 years ago

Thanks for the explanation, this could go directly in the documentation! :) What if no Pax-DB data is available for my organism of interest?

Requiring defining the location of the /geckomat* and /databases* paths as parameters sounds like a good solution.

BenjaSanchez commented 6 years ago

What if no Pax-DB data is available for my organism of interest?

I guess it would have to be replaced with some proxy, e.g. the fraction of enzymes from the total, although that would underestimate metabolic enzymes... @IVANDOMENZAIN any ideas here?

IVANDOMENZAIN commented 6 years ago

@edkerk I have used relative proteomics datasets (when available) as a substitution for the prot_abundance.txt file, the f values that I have obtained for different organisms with this approach range from 0.3 to 0.48. As the f value is used for constraining the protein pool, I think that using a high value such as 0.48 or 0.5 also makes some sense because the protein pool becomes a limitation just for growth at very high rates (simulating batch conditions) but not for chemostat simulations with microbial models.

BenjaSanchez commented 4 years ago

Something to address here brought up by @sulheim:

parameter file can also provide the paths for the required database-files, cultivation data etc.

edkerk commented 1 year ago

This will be completely revamped with GECKO3, the discussion here is obsolete.