Open kjappelbaum opened 1 year ago
If the goal is to go big, you could also consider sourcing the molecules from ZINC-22 instead of only Enamine. They have ~37 billion make on demand molecules from Enamine, WuXi and Mcule.
This could be even easier to access through Virtual Flow 2.0: https://www.biorxiv.org/content/10.1101/2023.04.25.537981v1
They already enumerated Enamine REAL Space and make the library available in PDB, PDBQT, MOL2, SDF, SMILES, SELFIES, and Parquet formats. They also calculated 18 molecular properties:
molecular weight, logP, hydrogen bond donor count, hydrogen bond acceptor count, rotatable bond count, topological polar surface area (TPSA), logS, aromatic ring count, molecular refractivity (MR), formal charge, positive charge count, negative charge count, fsp3, chiral center count, halogen atom count, sulfur atom count, and stereoisomer count
Their enumeration process expands the original 31.5B compounds to 68.7B.
https://discord.com/channels/850068776544108564/1080848065914753185/1087730198419607592