Open DreamMemory001 opened 3 years ago
Hi,
unfortunately, there are many implementations of canonicalization. I do not remember exactly what program and version we used to make the picture. Generally, I use OpenBabel and RDKit. Nonetheless, the point is that you can make "canonicalized" and "random" SMILES with the help of the software you are currently using. If you are consistent with training and prediction, you will the results we described in the paper. The original code used RDkit but then I started working on a standalone version for ordinary people. It turned out that using Openbabel is much convenient in that context.
Concerning the dataset, I can upload it somewhere. But you can make the dataset yourself easily.
Thanks for your reply, i use rdkit
version-2021.3.5. but when i plot this canonical SMILES in Fig.1, it return None. I get confused.
Finally, I hope you can give me a link of your datasets input of SMILES canonicalization model. Because i want to get the format of your datasets. Thank you very much.
I don't know if you saw my comment. If you have spare time, I hope you can give me a brief answer. Thank you very much.
First of all, i reckon that is a fantastic work. I want to ask some problem about it:
i: Fig.1 in this paper, Benzylpenicillin canonical SMILES is but i get it in the website of ChEMBL is include Fig.3, the canonical SMILES of CHEMBL351484 is different from website of ChEMBL. And i use
rdkit
to get these canonical SMILES, i get the same result as website of ChEMBL Because of them, i get a little confused. ii: i want to ask you where i can find datasets input of SMILES canonicalization model. Just as 17,657,995 canonicalization pairs written in reactions format separated by ‘ >> ’. Each pair contained on the left side a non-canonical, and on the right side—a canonical SMILES for the same molecule.I hope to get your reply. Thanks.