cdk / cdk

The Chemistry Development Kit
https://cdk.github.io/
GNU Lesser General Public License v2.1
486 stars 157 forks source link

Suggestion #440

Closed RicardoMBorges closed 6 years ago

RicardoMBorges commented 6 years ago

Hello, I’m very interested in this, but quite new in programming and/or chemoinformatics. I believe We can find in here something that will help me. So, I have an in-house database with several SMILES structures in a CSV file, together with a lot of other information including activity/toxicity simulated data. Do CDK has capabilities to calculate the structural similarity between those SMILES and run it into Cytoscape for visualization?

Thank you

egonw commented 6 years ago

Did you see Chemviz2?

egonw commented 6 years ago

Yes, basically it has. You probably want to make a TSV file with on each line two compounds, their SMILES, and the similarity. The latter you can calculate with the CDK, e.g. with a rcdk or Groovy script. I suggest to only keep high similarity combinations, because, of course, you can link any SMILES to any SMILES. Then, once you have that file loaded into Cytoscape, then using Chemviz2 to visualize the compounds on the nodes. One thing you could check is if Chemviz2 does not itself can calculate the similarities.