brsynth / RetroPathRL

Reinforcement Learning based bioretrosynthesis tool
MIT License
48 stars 16 forks source link

Sculpting the available pathway #7

Closed ae-iced closed 4 years ago

ae-iced commented 4 years ago

This is not an issue, but a feature request/question:

I want to block certain pathways, so that I can remove certain intermediates from the sinks. Say I want to avoid glycerol,glucose,NADH,etc. from contributing to the sinks, but I don't want to block them from acting as an intermediate. How can I do that?

What I've done so far was to edit one of the sink files and remove the offending sink chemicals. Then edit the calculate_organisms.py, Tree.py and organisms.py files slightly and add this new set of sinks as a new sink/organism so it can generate its pickles (What IS that??) and go about it's business. Is this the right way to do it? If something is not in the sinks, can it still be an intermediate of a pathway? Is there a better way to do this?

Bonus questions: How can I bias the moves? I can't find much about the toxicity bias and how it's applied? Am I missing something in the papers/supplements?

Again, this has a lot of potential owing to its flexibility. Thank you!

bdelepine commented 4 years ago

Hi @ae-iced,

If I understand you correctly, you want your final/ideal pathway found by RP3 to biologically start from a restricted list of chemicals that would be "allowed starting-points". Additionally, you do not want to forbid any special chemical to be used as an intermediate... and you are afraid that by tweaking the "sink" parameter, you will end-up breaking something.

The initial "sink" is precisely what you are looking for. So you are doing everything (almost) like you should. The initial sink is defined by the --organism_name parameter, to which are added the chemicals from the --complementary_sink. Let me explain what is this sink and the logic between those parameters.

The sink is by definition the list of chemicals that will not trigger new retrosynthesis step: they are end-point for the retrosynthesis algorithm (that works from your target chemical to the chassis/organism, i.e. your "allowed starting-points"). The sink is basically actualized at each retrosynthesis step by adding the metabolite that is being looked at (so that the reactions it can be part of are computed only once). By setting an initial sink, you basically define the "allowed starting-points" for your final pathway. Additionally, RP3 uses the initial sink when it count the number of found pathways. This idea of a "sink" that is both the "allowed starting-points" and the actualized list of "what should not trigger a new step" may be a bit confusing.

So, to get back to your problem, you can:

Just remember that all the chemicals of a RP3 pathway must be either the target, in the "allowed starting-points", or intermediates between those two. So you most probably want to keep usual cofactors in this list of "allowed starting-points" otherwise you will spend a lot of time inventing pathways for ATP, etc.

The bonus question is worth another GitHub issue ;) To put you on the right direction, you can have a look at... the source code: https://github.com/brsynth/RetroPathRL/blob/fb8e1db013e2c1679ed8def2392af619af7ee4b3/compound_scoring.py#L46 and https://github.com/brsynth/RetroPathRL/blob/fb8e1db013e2c1679ed8def2392af619af7ee4b3/UCT_policies.py#L221 You will need to code similar functions and import them in all relevant files (or change the toxicity one) .

HTH, best

ae-iced commented 4 years ago

Thank you for the detailed response.

I like approach #2 (organism set to none), but considering your answer, I'll have to spend some time on the sink to make sure I don't run out of O2, ATP, NAD, etc. and have the necessities in the sink.

Cheers