draeger-lab / refinegems

refineGEMs is a python package inteded to help with the curation of genome-scale metabolic models (GEMS).
https://refinegems.readthedocs.io/en/latest/
MIT License
10 stars 1 forks source link

Redesign `main.py` to be more flexible #46

Closed famosab closed 1 year ago

famosab commented 1 year ago

@GwennyGit mentioned that some people might want to use all model-modifying functions at once.

I realised that using only the functions for multiple models might be desirable.

The main.py and consequently the config.yml file can be reevaluated and redesigned to fit to a user only working with refineGEMs as standalone program. However we can also keep in mind that I'll provide a jupyter notebook (see #22) that will showcase how to use different functions. In comparison with main.pypeople should be able to tweak refineGEMs to their needs. This probably needs a bit more discussion and thought on how to structure the main.py in a way that is then usable. It might be easier to focus on the main functions that an inexperienced user might need and tailor it in that way. Everything else will most likely be used by people experienced with Python so that a huge main.py script might not be desirable.

@GwennyGit @draeger What are your opinions on this?

draeger commented 1 year ago

My understanding is that @GwennyGit desires a default model refinement routine that sequentially runs all individual functions without further interaction and provides an updated model as result. Also, one of the questions she had is whether complementary tools (ModelPolisher in particular) should be used before or after improving a model using refinegems. To me, it seems useful to have an easily usable function that automatically tries out many refinement steps (if there is none yet).

famosab commented 1 year ago

Maybe it makes sense to distribute both separately:

GwennyGit commented 1 year ago

Yes, my thought was that it would be quite nice to have such a default routine where one applies all refinement steps at once. However, after using ModelPolisher I think a better routine would be to use the polish CarveMe (now: polish) module, then ModelPolisher and afterwards use the other refinement modules. I think a hint for the user that if one wants to use ModelPolisher to use it before the SBO annotation as well as before the KEGG pathways as groups modules would be good. It might even be good to apply the charge correction module only after ModelPolisher.
My reason for this is that after I used ModelPolisher for my model which was already refined by the refinement modules of refineGEMs I realised that more database annotations were added to the model. Hence, I will run again the SBO annotation as well as the KEGG pathways as groups module. For the charge correction module I am still unsure if I want to run it again but ModelPolisher also improved my charges.
Update on using the charge correction module after running ModelPolisher As ModelPolisher improved the charges of my model I will not apply the charge correction module a second time. However, I think using the polish as well as the charge correction module before applying ModelPolisher makes sense.

famosab commented 1 year ago

Maybe it makes sense to provide a sort of second main.py script which does what you described. That would also lead to a simpler second yaml. Maybe we can store the older main script in a different folder aimed to be used by experienced users while the new main script just applies everything, writes possible output into a log file (as described in #42) and returns the changed model to the user. Then we can keep the ''complicated'' yaml file and use a simpler yaml file which is well commented (see example in #57).

We would need to collect the list of steps that can be automated into that simpler script I would suggest one run of growth simulation before and after the curating so that the user can see what changed in the models behavior.

famosab commented 1 year ago

Should we save the modified models per standard in the out_path and extend the filename with _kegg (etc.)? I think that would make sense, as long as we still allow the user to define their own filename.

GwennyGit commented 1 year ago

Yes, I think that is a good idea. 👍🏻

famosab commented 1 year ago

I did not implement a separate saving for each module but instead the user will be prompted to define an out_path or the model will be saved to the out path as <model.id>_modified_<date>.xml. I am currently testing a run through multiple model modifications (KEGG Pathways, SBO Term and polish) and will see whether that works.

famosab commented 1 year ago

I tested refineGEMs on two random models downloaded from the BiGG database (iYL1228 and iAF987) as well as on my models with the config shown below. It seems to me that everything in the new main file works as expected and all modules are executed as intended. I would close this issue for now since main.py is now able to execute multiple model curation steps. I left the old main.py in the repo for referencing but this can probably be deleted in a bit of time.

charge_corr: true
growth_basis: default_uptake
id_db: BIGG
media:
- SNM3
- LB
- M9
- CGXlab
memote: false
model: /Users/baeuerle/Organisation/Masterarbeit/rg_out/iYL1228.xml
out_path: ../rg_out/
polish: true
sboterms: true
visualize: true