Open Lyq322 opened 1 week ago
I could not find the R/S configuration for this molecule:
'd-10-acetoxy-cis-7-hexadecen-1-ol': 'OCCCCCC\C=C/CC(OC(=O)C)CCCCCC'
Also, should I change the (+)/(-) and d/l in the smiles list to R/S so it is more consistent and easier to understand?
I calculated the tanimoto similarity scores between this list and one of the tranches in the zinc database (AAAA): I found that most of the molecules in this list is not similar to the any of the molecules in the zinc database tranche. The maximum tanimoto score was 0.3137 between these two molecules: From zinc database: From SMILES list:
Interesting, nice plot! It's probably because their combination algorithms are more drug designed base and less applicable to other chemical spaces. Tanimoto scorring is pretty strict:
tanimoto_scores = DataStructs.BulkTanimotoSimilarity(fp, ref_fps)
dice_scores = DataStructs.BulkDiceSimilarity(fp, ref_fps)
kulczynski_scores = DataStructs.BulkKulczynskiSimilarity(fp, ref_fps)
mcconnaughey_scores = DataStructs.BulkMcConnaugheySimilarity(fp, ref_fps)
onbit_scores = DataStructs.BulkOnBitSimilarity(fp, ref_fps)
rogot_goldberg_scores = DataStructs.BulkRogotGoldbergSimilarity(fp, ref_fps)
russel_scores = DataStructs.BulkRusselSimilarity(fp, ref_fps)
sokal_scores = DataStructs.BulkSokalSimilarity(fp, ref_fps)
if all(x > criteria for x in tanimoto_scores):
print ('Tanimoto Accepted: %s' % value)
if all(x > criteria for x in dice_scores):
print ('Dice Accepted: %s' % value)
if all(x > criteria for x in kulczynski_scores):
print ('Kulczynski Accepted: %s' % value)
if all(x > criteria for x in mcconnaughey_scores):
print ('Mcconnaughey Accepted: %s' % value)
if all(x > criteria for x in onbit_scores):
print ('On Bit Accepted: %s' % value)
if all(x > criteria for x in rogot_goldberg_scores):
print ('Rogot Goldberg: %s' % value)
if all(x > criteria for x in russel_scores):
print ('Russel: %s' % value)
if all(x > criteria for x in sokal_scores):
print ('Sokal: %s' % value)
There's a bunch of other similarity metrics as well that could be useful but by first glance not great. Can we compare to a fragrant database. The problem is that the data is usually sold rather than available open source:
I found this, is anything we can use in here?
@ANUGAMAGE Review this PR and add it as a node into global-chem, this will increase the version as well and we can do a new release.
Is this molecule incomplete? The -yl suffix makes me think it's an ester. When I google the molecule, I also get results on the cyclopropyl propanoate ester being the pheromone of the American Cockroach and not the cyclopropane.
Maybe I wrote it wrong? I will check again on it. There's a little arrow and a star that says "Maybe not work". Idk what I meant there.
Add chemicals from Insect Sex Pheromones by Martin Jacobson to GlobalChem