OpenSourceMalaria / Series4_PredictiveModel

Can we Predict Active Compounds in OSM Series 4?
7 stars 10 forks source link

Changes in SMILES code in the Master Chemical List? #31

Open jonjoncardoso opened 3 years ago

jonjoncardoso commented 3 years ago

Hi everyone,

Our group at the Department of Informatics at King's College London - under Dr. Sophia Tsoka @sophiatsoka - have been revisiting this modelling challenge and we have some questions about changes in SMILES codes in the Master Chemical List.

Ruby (@yutongLi1997) has downloaded the newest version of the master list and compared it with the previous version I had from when I participated in Round #2 of the Competition.

She notice that the structures listed below were a bit different this time. My guess is that these compounds had the wrong SMILES and had been revised more recently but I couldn't locate the changes in the spreadsheet. Can anyone confirm this?

# OSM Codes:
['OSM-S-82', 'OSM-S-88', 'OSM-S-89', 'OSM-S-351', 'OSM-S-546', 'OSM-S-631']

#old_smiles:
 ['CNC(=O)COC(=O)c1cc(C)n(c1C)c2ccc(F)cc2',
 'CNC(=O)CN1CCC(CC1)NCc2cc(C)n(c2C)c3ccc(Cl)cc3',
 'CCNC(=O)[C@@H]1C[C@@H](N)CN1Cc2cc(C)n(c2C)c3ccccc3Cl',
 'Clc1cccc(c1Cl)c2nnc3cncc(OCCc4ccccc4)n23',
 'Fc1ccc(CCOc2cncc3nnc(c4ccc5c[nH]nc5c4)n23)cc1F',
 'COc1ccc(cc1)c2n[nH]c(n2)c3nccn3CCc4ccccc4']

#new_smiles:
['CC1=CC(=C(C)[N]1C2=CC=C(C=C2)F)C(=O)OCC(=NC)O',
 'CC1=CC(=C(C)[N]1C2=CC=C(C=C2)Cl)CNC3CCN(CC3)CC(=NC)O',
 'CCN=C([C@@H]1C[C@H](CN1CC2=C(C)[N](C(=C2)C)C3=CC=CC=C3Cl)N)O',
 'ClC1=CC(Cl)=C(C2=NN=C3C=NC=C(N32)OCCC4=CC=CC=C4)C=C1',
 'FC1=CC(CCOC2=CN=CC3=NN=C(C4=CC=C5C(NN=C5)=C4)N32)=CC=C1F',
 'COC(C=C1)=CC=C1C2=NN=C(N2)C3=NC=CN3CCC4=CC=CC=C4']

PS: What we have been up to

mattodd commented 3 years ago

No idea, sorry. Do the SMILES resolve to different structures, or are they representing the same molecules? @edwintse any idea? Very good you're tweaking and improving. Still very keen on improving these compounds' potency.

edwintse commented 3 years ago

@jonjoncardoso I've had a check and it looks like the differences between the old and new SMILES is just the way of representing the aromatic rings (i.e. using a circle rather than the Kekule form). SMILES are always very dependant on how you draw the structure out so the InChI and InChI Key are more consistent.

The only exception is OSM-S-351 which was changed because the old strings were incorrect (i.e. should be 2,4-Cl instead of 2,3-Cl).

jonjoncardoso commented 3 years ago

Thanks @mattodd and @edwintse, we will test with InChI/IChI keys to make sure our modelling is consistent.

Indeed some of the SMILES do resolve to slightly different structures.

Here are 2D visualizations of these structures (['OSM-S-82', 'OSM-S-88', 'OSM-S-89', 'OSM-S-351', 'OSM-S-546', 'OSM-S-631']). The molecule constructed from the old SMILES is displayed on the left, the new one is displayed on the right. (OSM-S-351 is displayed correctly on the right as pointed by Edwin)

old_vs_new