connorcoley / rdchiral

Wrapper for RDKit's RunReactants to improve stereochemistry handling
MIT License
154 stars 51 forks source link

The explicit definition of "spectator" and its impact on RDChiral #45

Open 0m1n0 opened 1 year ago

0m1n0 commented 1 year ago

Hi,

I work in the bioinformatics field and I found your tool very interesting.
I saw that a reaction can be written as:

In biology and biochemistry, a cofactor can be present in a reaction and it is defined as follow (from Wikipedia):

A cofactor is a non-protein chemical compound or metallic ion that is required for an enzyme's role as a catalyst. Cofactors can be considered "helper molecules" that assist in biochemical transformations.

Cofactors can be divided into two major groups: organic cofactors, such as flavin or heme; and inorganic cofactors, such as the metal ions Mg2+, Cu+, Mn2+ and iron–sulfur clusters.

I assume that this term does not correspond to spectator.

So here are my questions:

  1. Could you give me an explicit definition about spectator in your code?
  2. Are the results significantly different with or without a spectator?
  3. Considering the cofactor property, is it better to integrate it as reactant and product (cofactor is present on both sides of the reaction with a minor modification)? I'd like to extract templates that don't depend on a cofactor.

I'm sorry to bother you with these beginner's questions. Any advice is welcome and appreciated. Thank you :)

Min

connorcoley commented 1 year ago

From the perspective of templates, spectators are completely ignored right now. A spectator would be a component that does not contribute heavy atoms to the product molecule (according to the atom mapping). If a co-factor is present on both sides with no modification, I would add it in as a post-processing step

0m1n0 commented 1 year ago

Thank you for your reply!

Clarifying chemoinformatics temrs

So the "spectator" is part of the "agent"?

How to place a cofactor in chemoinformatics?

If a co-factor is present on both sides with no modification, I would add it in as a post-processing step

There are modifications of co-factor, as you can see some examples in this plot:

Do you think that these modifications (such as the addition of energy by deprotonation) can be excluded from template extraction?

image

Thank you, Min

connorcoley commented 1 year ago

The current template extraction pipeline will ignore anything that is not part of the reactants or products in a reaction SMILES. It will further ignore any component that does not change any of the properties of its atoms.

If you include NADH in the reactants and NAD+ in the products, and if they are atom mapped so the correspondence is clear, then the part of NADH that undergoes the change will be included in the template. If you want the whole molecule in the template, I would suggest adding it in a post-processing or defining the whole cofactor as a “special group” in the code.

Whether the cofactor can be excluded is entirely up to you and your use case. If I were applying this to metabolic engineering or enzymatic retrosynthesis, I would treat cofactor selection as a distinct step. If I were engineering cascades and worrying about cofactor recycling, I would probably include them

On Mon, Jul 31, 2023 at 06:08 Min @.***> wrote:

Thank you for your reply! Clarifying chemoinformatics temrs

  • A reaction can be write as (based on Daylight 3. SMILES - A Simplified Chemical Language https://daylight.com/dayhtml/doc/theory/theory.smiles.html, section "3.5 Extensions for Reactions"):
    • reactant > agent > product
    • reactant >> product
  • I also saw "reagent" somewhere and it looks more like a substance that detecting/indicating a reaction (based on definition IUPAC Compendium of Chemical Terminology https://goldbook.iupac.org/terms/view/R05163). I assume that this term is not used in this tool (and is rarely used in other chemoinformatics tools, except for metadata).

So the "spectator" is part of the "agent"? How to place a cofactor in chemoinformatics?

If a co-factor is present on both sides with no modification, I would add it in as a post-processing step

There are modifications of co-factor, as you can see some examples in this plot:

  • NADH -> NAD+
  • ATP -> ADP
  • GTP -> GDP
  • ...etc

Do you think that these modifications (such as the addition of energy by deprotonation) can be excluded from template extraction?

[image: image] https://user-images.githubusercontent.com/17426349/257186605-9049eef0-ef2a-4038-81c7-a294beae22f8.png

Thank you, Min

— Reply to this email directly, view it on GitHub https://github.com/connorcoley/rdchiral/issues/45#issuecomment-1658069911, or unsubscribe https://github.com/notifications/unsubscribe-auth/ABAEXJQ5QUTJCSBINIRQEADXS57ZNANCNFSM6AAAAAA2Y4Q4XQ . You are receiving this because you commented.Message ID: @.***>

-- Sent from my phone; please excuse my brevity

0m1n0 commented 1 year ago

It's really kind of you to have replied so quickly, and it's very clear!

As I'm interested in the reaction chain (metabolic pathway), I'll proceeded as follows:

  1. Annotate principal metabolites and co-factors and get canonical SMILES (using RDKit)
  2. Put them together in the reaction as reactants
  3. Atom numbering (I've used RXNMapper but if you have another suggestion, I'd love to hear from you.)
  4. Run template_extractor of RDChiral
  5. Analysis (I can re-index principal metabolites and co-factors here using step 1 and exclude co-factors if needed )

Thank you, Min

connorcoley commented 1 year ago

I'm not positive what the "Analysis" step would involve for you, but this workflow sounds okay to me!

0m1n0 commented 1 year ago

The final aim would be to group different reactions by their template. Then, given a compound (i.e. reactant), find out whether there's a match in the templates in order to obtain a potential product (compound template).

So the main steps of analysis would be:

  1. Understand why all reactions cannot provide templates. There are several possible causes: wrong annotation in public databases; non canonical SMILES format; issues during atom numbering (by the way, I don't think RXNMapper works with wildcard *.) ...etc
  2. Re-index principal metabolites and co-factors
  3. Group reactions by template (and probably by principal metabolites)
  4. Global analysis of reaction and template relationships (e.g. number of unique reactions by template)
  5. Check whether there is a hierarchy between templates (e.g. template B is a sub-group of template A)
  6. Given a compound X, explore which template it may belong to, then retrieve potential products

I saw ASKCOS and I think it's basically the same idea. But I wanted to understand the central part and I especially wanted to work with data from biochemical reactions.

connorcoley commented 1 year ago

Understood, thanks for the elaboration. You might be interested in some related work:

0m1n0 commented 1 year ago

Wow, thank you! I'll read it :)