Open feiranl opened 2 years ago
Maybe firstly store such information in a tsv file firstly? @feiranl @edkerk
That's a good idea. It's not that straightforward where this information should be stored. In the long run it should ideally be kept in the model file, and not just as a separate TSV. But this raises three issues:
This is not of the highest priority, but would be good to start a discussion on how to overcome especially the first two issues.
After we have the curated datasets. we can put it under this folder https://github.com/SysBioChalmers/yeast-GEM/tree/main/data/databases?
- Stoichiometry coefficients are missing in current GEM but available in Complex Portal and PDB database.
I'm really unsure about which coefficients this discussion is about, could you give an example?
The number of protein subunits that make up a protein complex. It is not always just 1 copy of each subunit. See for instance pyruvate dehydrogenase: https://www.ebi.ac.uk/complexportal/complex/CPX-3207:
This would for instance make a difference if the model would be turned into an ec-model, but there are likely also other use cases.
This would for instance make a difference if the model would be turned into an ec-model, but there are likely also other use cases.
Thanks for the example @edkerk. Wouldn't then this be more useful to be dealt with by a future version of GECKO instead? I'm thinking that it would make more sense for this information to be following the same structure regardless of the model.
But that is if GECKO is the only purpose. GECKO should be modified to be able to deal with such information anyway (it currently assumes 1 copy per subunit), but should that information only be provided in GECKO, or distributed as part of the generic yeast-GEM, also ready for other applications?
generic yeast-GEM, also ready for other applications
Seems there no direct applications of this information in yeast-GEM?
I'm very hesitant to "copy" external data in any repository, unless it would be useful for a good chunk of the userbase.
also ready for other applications?
I see (but I don't know how often this would be the case). Perhaps what can be stored in this repository is a script that fetches the data via an API, and maybe does some reformatting, but not the data itself, since that will by default get stale so it requires work to keep up to date.
I agree with Mihail. But here it is not just "copy". We get this stoi info from ComplexPortal and information from PDB database. The script extracts protein structures from PDB database, mapping the subunit through sequence alignment and then find the stoi.
Description of the issue: