PattanaikL / GeoMol

MIT License
154 stars 43 forks source link

Questions about stereoisomer issues in the evaluation of GeoMol #6

Open qcxia20 opened 2 years ago

qcxia20 commented 2 years ago

https://github.com/PattanaikL/GeoMol/blob/5d0e85014a9546209d5b43861638caabb362ec25/scripts/compare_confs.py#L49-L56

https://github.com/PattanaikL/GeoMol/blob/5d0e85014a9546209d5b43861638caabb362ec25/model/featurization.py#L125-L126

After (with clean_confs, more confs are included than before) Recall Coverage: Mean = 74.30, Median = 90.00 Recall AMR: Mean = 0.9489, Median = 0.8797 Precision Coverage: Mean = 65.50, Median = 81.80 Precision AMR: Mean = 1.1044, Median = 1.0041

isomericSmiles=True Recall Coverage: Mean = 83.38, Median = 100.00 Recall AMR: Mean = 0.8233, Median = 0.8079 Precision Coverage: Mean = 72.73, Median = 87.50 Precision AMR: Mean = 0.9833, Median = 0.8895


As you can see, if `isomericSmiles=True`, the performance in GeoMol paper's result can be reproduced.
***
When I tried to walk further related to this issue, I found another weird thing that GeoMol will generate the conformers close in 3D geometry though with different stereoisomerism in SMILES as input. And the conformers close in 3D geometry are different stereoisomers in their SMILES. This issue does not exist in RDKit ETKDG and I am not sure if it will affect GeoMol's performance on these molecules. Here I give two examples on that, 
|SMILES| GeoMol (trans)  | GeoMol (cis) | ETKDG (trans) | ETKDG (cis) |
|--| -- | -- | -- | -- |
| O=S(=O)(_N=C(_c1ccccc1)N1CCOCC1)c1ccc(Br)cc1 |![image](https://user-images.githubusercontent.com/56123242/148317945-d967bc36-8813-4eb1-8597-a9fadb000e29.png)|![image](https://user-images.githubusercontent.com/56123242/148318189-ec89a81d-414e-48b9-8568-6ad7b62e483a.png) | ![image](https://user-images.githubusercontent.com/56123242/148321905-1fe23be1-da2c-4dc0-b1e3-65d50f50f498.png) | ![image](https://user-images.githubusercontent.com/56123242/148321930-68f49cfe-08bc-4607-a877-47947e5aec33.png)
| Cc1cc(C(=O)c2cnc(_N=C_N(C)C)s2)c(F)cc1Cl|![image](https://user-images.githubusercontent.com/56123242/148318227-6948f3de-876d-482d-a05f-c3ae8ac38100.png)|![image](https://user-images.githubusercontent.com/56123242/148318342-db154f0a-01be-4ae4-a0f5-811954750ad2.png) | ![image](https://user-images.githubusercontent.com/56123242/148321946-1cbb5f8f-e8ed-4429-af9f-c458137e3dbb.png) | ![image](https://user-images.githubusercontent.com/56123242/148321954-f94c5ddc-5c9a-433e-8db4-c4d8dcbb2237.png)