Closed merdivane closed 1 year ago
Sure, the SMILES for the first molecule is CCCC1=NN(C)C(NC(CN2C(NC3(C2=O)CCC(CC3)C)=O)=O)=C1
with the substructure O=CNC1=CC=NN1C
, and the SMILES for the second molecule is CC(C)(C1=CC=C2OC=C(C2=C1)CC(NC3=CC=CC=C3F)=O)C
with the substructure C12=CC=CC=C1C(CCNC3=CC=CC=C3)=CO2
.
Hi ! I tried to run the optimisation procedure with the code you provided in this issue on the first molecule and its substructure. However, none of the molecules seem to have maintained the substructure. I will attach the exact code I am running. Am I overlooking something?
Thanks in advance !
def create_mask(smile, substructure):
orig_z = smiles_to_z([smile], vae)
orig_x = torch.exp(vae.decode(orig_z))
substruct = Chem.MolFromSmiles(substructure)
selfies = list(sf.split_selfies(sf.encoder(smile)))
mask = torch.zeros_like(orig_x)
for i in range(len(selfies)):
for j in range(len(dm.dataset.idx_to_symbol)):
changed = selfies.copy()
changed[i] = dm.dataset.idx_to_symbol[j]
m = Chem.MolFromSmiles(sf.decoder(''.join(changed)))
if not m.HasSubstructMatch(substruct):
mask[0][i * len(dm.dataset.idx_to_symbol) + j] = 1
return mask, orig_z, orig_x
mask, orig_z, orig_x = create_mask(smile='CCCC1=NN(C)C(NC(CN2C(NC3(C2=O)CCC(CC3)C)=O)=O)=C1', substructure='O=CNC1=CC=NN1C')
z = orig_z.clone().detach().requires_grad_(True)
optimizer = torch.optim.Adam([z], lr=0.1)
smiles = []
logps = []
for epoch in tqdm(range(50000)): # 50000
optimizer.zero_grad()
x = torch.exp(vae.decode(z))
loss = model(x) + 1000 * torch.sum(((x - orig_x.clone().detach()) * mask) ** 2)
loss.backward()
optimizer.step()
if epoch % 1000 == 0:
# x, logp = get_logp(z)
# logps.append(logp.item())
smiles.append(one_hot_to_smiles(x))
for s in smiles:
m = Chem.MolFromSmiles(s)
substruct = Chem.MolFromSmiles('O=CNC1=CC=NN1C')
print(m.HasSubstructMatch(substruct))
There are 2 molecules shown in task 4.5. Could you please provide smiles for original molecules and substructures for both of them?