MolecularAI / REINVENT4

AI molecular design tool for de novo design, scaffold hopping, R-group replacement, linker design and molecule optimization.
Apache License 2.0
359 stars 89 forks source link

Unexpected behaviour with MatchingSubstructure. #90

Closed Apl0x closed 5 months ago

Apl0x commented 5 months ago

I have a very simple matching substructure query: [#7][#6][#6][#6][#7]

Defined as a component as follows:

MatchingSubstructure_ncccn = f"""
[[stage.scoring.component]]
[stage.scoring.component.MatchingSubstructure]
# Note, this a penalty component and is applied multiplicatively after all others.
[[stage.scoring.component.MatchingSubstructure.endpoint]]
name = "NCCCN Matching SMARTs substructure"
weight = 1
params.smarts = "[#7][#6][#6][#6][#7]"
params.use_chirality = false
"""

Unfortunately I am unable to share the compounds this has produced but I can assure you that they all have a nitrogen followed by 3 carbons and then another nitrogen. (Usually something like this smiles: "NCCC1CCCN1") but the score is returning 0.5 rather than 1.

This is using REINVENT 4.3.5 and an agent from a transfer learning run. The other scoring components appear to be functioning as expected.

halx commented 5 months ago

Hi Alex,

MatchingSubstructure is a penalty term so will assign a multiplier of 0.5 for every matching compound. That may be contrary to expectations but you can reverse this simply with a step function if you wish.

Many thanks, Hannes.

Apl0x commented 5 months ago

Hi Hannes, sorry yes I closed the issue before you replied. I was hasty in submitting this and solved the issue before your response.

I will be more diligent next time. Sorry!