Open HenryJia opened 1 year ago
Did a bit more digging: It seems that the first molecule is actually acebutolol according to Chemspider http://www.chemspider.com/Chemical-Structure.1901.html?rid=c3544756-c17f-41b3-b7a6-9b1ec7325512
Something else to add to this, it seems like there are 2 instances of Tiotidine
62,22767,1,c1(nc(NC(N)=[NH2])sc1)CSCCNC(=[NH]C#N)NC
and
384,Tiotidine,1,CN=C(NCCSCc1csc(N=C(N)N)n1)NC#N
Are the same according to ChemSpider http://www.chemspider.com/Chemical-Structure.45601.html?rid=4afdeaf4-b290-4375-b966-11b94813582e
There are also 2 instances of Homophenazine/Homofenazine. These molecules are indeed identical when viewed using RDKit and VMD
1864,homofenazine(homophenazine),1,C1=C(C(F)(F)F)C=CC3=C1N(C2=C(C=CC=C2)S3)CCCN4CCN(CCO)CCC4
660,homophenazine,1,[H+].[H+].[Cl-].[Cl-].OCCN1CCCN(CCCN2c3ccccc3Sc4cc(ccc24)C(F)(F)F)CC1
🐛 Bug
In addition to https://github.com/deepchem/deepchem/issues/2336 There appears to be 2 Bacitracin in the BBBP dataset, with different SMILES. One of which is BBBP+, one of which is BBBP-
55,Bacitracin,1,c1(c(cc(NC(CCC)=O)cc1)C(C)=O)OCC(CNC(C)C)O and 202,Bacitracin,0,CCC(C)C(N)C1=NC(CS1)C(=O)NC@@HC(=O)NC@HC(=O)NC@@HC(=O)NCCCC[C@@H]2NC(=O)C@HNC(=O)C@@HNC(=O)C@HNC(=O)C@@HNC(=O)C@@HC@@HCC
Clearly, only one of those can be Bacitracin as they are very different