feat: universal DLKcat.tsv for both light and full ecModels

SysBioChalmers / GECKO

Toolbox for including enzyme constraints on a genome-scale model.

http://sysbiochalmers.github.io/GECKO/

MIT License

66 stars 51 forks source link

feat: universal DLKcat.tsv for both light and full ecModels #248

Open Yu-sysbio opened 1 year ago

Yu-sysbio commented 1 year ago

https://github.com/SysBioChalmers/GECKO/blob/1710a9861a60249207d547585cc96581e0ece18b/tutorials/protocol.m#L106

I ran this successfully when building a full ecModel, but got the error below when building a light ecModel:

Error using readDLKcatOutput (line 56) Not all reactions from DLKcat output can be found in model.ec.rxns

edkerk commented 1 year ago

Did you use the same DLKcat output file to make both a light and full ecModel?

ae-tafur commented 1 year ago

Maybe it because here:

https://github.com/SysBioChalmers/GECKO/blob/1710a9861a60249207d547585cc96581e0ece18b/src/geckomat/gather_kcats/readDLKcatOutput.m#L55

it compares the rxn identifiers but in light version, the identifiers in model.ec.rxns have a prefix 001_r_001

edkerk commented 1 year ago

Indeed, this check is to make sure that the kcatList can later be used by selectKcatValue, which just matches by reaction ID: https://github.com/SysBioChalmers/GECKO/blob/1710a9861a60249207d547585cc96581e0ece18b/src/geckomat/gather_kcats/selectKcatValue.m#L51

Short term solution is to have the error suggest to make full/light specific versions of the DLKcat.tsv file. Longer term solution is to have selectKcatValue not just check by the ec.rxns style reaction identifier, but instead removes any suffices/prefixes (so returns to the ecModel.rxns format, and also matches to the enzymes that are annotated. Or something similar, at least not only matching by reaction identifier. This will require some more refactoring and testing.

Yu-sysbio commented 1 year ago

Did you use the same DLKcat output file to make both a light and full ecModel?

Yes

Yu-sysbio commented 1 year ago

I just notice that the DLKcat output files generated by full and light protocols are not interchangeable due to rxn IDs in these two types of ecModels. I do not think that it is very necessary to make them interchangeable by refactoring code. Instead I prefer the short-term solution, and we should clarify this in the protocol.

edkerk commented 1 year ago

Mentioned in the protocol. Longer term goal would be to enhance the parsing of the file, to avoid this.

mihai-sysbio commented 1 year ago

Instead of mapping via the reaction ID, perhaps we could use another ID as a unique identifier for the sequence, and do the mapping back to the reaction within Matlab. From a modularity point of view, the current DLKcat.tsv mixes the concerns of GECKO and DLKcat. Was there a specific reason for avoiding gene IDs?

edkerk commented 1 year ago

No reason to avoid gene ID, that is actually my plan, to map by the core of the reaction ID (so without prefixes and suffixes from light and full ecModels) and the gene ID. And it's a GECKO-only concern, DLKcat doesn't care about identifiers.