Rose-STL-Lab / LIMO

generative model for drug discovery
58 stars 14 forks source link

Smiles for Substructure-constrained logP #9

Closed merdivane closed 1 year ago

merdivane commented 1 year ago

There are 2 molecules shown in task 4.5. Could you please provide smiles for original molecules and substructures for both of them?

PeterEckmann1 commented 1 year ago

Sure, the SMILES for the first molecule is CCCC1=NN(C)C(NC(CN2C(NC3(C2=O)CCC(CC3)C)=O)=O)=C1 with the substructure O=CNC1=CC=NN1C, and the SMILES for the second molecule is CC(C)(C1=CC=C2OC=C(C2=C1)CC(NC3=CC=CC=C3F)=O)C with the substructure C12=CC=CC=C1C(CCNC3=CC=CC=C3)=CO2.

MarieOestreich commented 1 year ago

Hi ! I tried to run the optimisation procedure with the code you provided in this issue on the first molecule and its substructure. However, none of the molecules seem to have maintained the substructure. I will attach the exact code I am running. Am I overlooking something?

Thanks in advance !

def create_mask(smile, substructure):
  orig_z = smiles_to_z([smile], vae)

  orig_x = torch.exp(vae.decode(orig_z))
  substruct = Chem.MolFromSmiles(substructure)
  selfies = list(sf.split_selfies(sf.encoder(smile)))
  mask = torch.zeros_like(orig_x)
  for i in range(len(selfies)):
    for j in range(len(dm.dataset.idx_to_symbol)):

      changed = selfies.copy()

      changed[i] = dm.dataset.idx_to_symbol[j]
      m = Chem.MolFromSmiles(sf.decoder(''.join(changed)))
      if not m.HasSubstructMatch(substruct):

      mask[0][i * len(dm.dataset.idx_to_symbol) + j] = 1
  return mask, orig_z, orig_x

mask, orig_z, orig_x = create_mask(smile='CCCC1=NN(C)C(NC(CN2C(NC3(C2=O)CCC(CC3)C)=O)=O)=C1', substructure='O=CNC1=CC=NN1C')

z = orig_z.clone().detach().requires_grad_(True)
optimizer = torch.optim.Adam([z], lr=0.1)
smiles = []
logps = []
for epoch in tqdm(range(50000)): # 50000
    optimizer.zero_grad()
    x = torch.exp(vae.decode(z))
    loss = model(x) + 1000 * torch.sum(((x - orig_x.clone().detach()) * mask) ** 2)
    loss.backward()
    optimizer.step()
    if epoch % 1000 == 0:
        # x, logp = get_logp(z)
        # logps.append(logp.item())
        smiles.append(one_hot_to_smiles(x))

for s in smiles:
    m = Chem.MolFromSmiles(s)
    substruct = Chem.MolFromSmiles('O=CNC1=CC=NN1C') 
    print(m.HasSubstructMatch(substruct))