girke-lab / chemminetools

ChemMine Tools: open source web framework for small molecule analysis
http://chemmine.ucr.edu
Other
36 stars 17 forks source link

Question about how to interpret data from Multidimensional Scaling Clustering Analysis #242

Open Gabe232 opened 1 year ago

Gabe232 commented 1 year ago

Hi,

I have a group of 16 compounds I am interested in performing some basic unbiased clustering analysis on to better understand if they sub-group in any stand-out way in terms of chemical or structural properties. I tried using the MDS clustering tool on ChemMine. I found it was very intuitive to set-up and use. However I am confused by the meaning of the readout. The FAQ portion of ChemMine did not appear to explain this.

My clustering readout is a 2D plot, where I assume distance represents 'likeness', but I'm not sure what parameters are used to determine this distance (I'm guessing molecular weight is one, since the compounds roughly organize from molecular weight along the x-axis, but I'm wondering what is else factored in here). Similarly, in addition to the distance between points, the points are also coloured according to different clusters they were assigned based off of my cut-off value of 0.4. Only 3 of my compounds occupy the same cluster/bin, and the rest have their own. However I'm wondering if there is a relative rank/order to these clusters? For example, are clusters 1 and 2 more similar than clusters 1 and 5? If so, then again what are the parameters defining 'clusters' over distance on the plot.

Some of my compounds are distant in terms of distance on my graph, but colour quite similarly. I'm trying to understand the relative meaning of cluster colour, and physical distance (eg: is distance a reflection of structural similarity, and colour a reflection of chemical properties? If so, what chemical properties).

Appreciate any help whatsoever!

tgirke commented 1 year ago

If done the default way on the website then the clustering is done by all-against-all structural similarity comparisons based on 2D descriptors (here atom pairs). This generates a distance (1-similarity) matrix that is used as input for the Multidimensional Scaling.(MDS). The results are presented as a scatter plot similar to a PCA plot, where point-to-point distances represent the similarities obtained in the structural comparisons. The cutoff you chose is used to create discrete clusters from the distance matrix (secondary binning clustering) to color label the data points in the MDS plot by similarity groups. In this case, the MDS clustering is based on structural similarities only and not physicochemical properties such as MW, logP etc.

I hope this explains it.

T. Girke

On Fri, May 5, 2023 at 1:14 PM Gabe232 @.***> wrote:

Hi,

I have a group of 16 compounds I am interested in performing some basic unbiased clustering analysis on to better understand if they sub-group in any stand-out way in terms of chemical or structural properties. I tried using the MDS clustering tool on ChemMine. I found it was very intuitive to set-up and use. However I am confused by the meaning of the readout. The FAQ portion of ChemMine did not appear to explain this.

My clustering readout is a 2D plot, where I assume distance represents 'likeness', but I'm not sure what parameters are used to determine this distance (I'm guessing molecular weight is one, since the compounds roughly organize from molecular weight along the x-axis, but I'm wondering what is else factored in here). Similarly, in addition to the distance between points, the points are also coloured according to different clusters they were assigned based off of my cut-off value of 0.4. Only 3 of my compounds occupy the same cluster/bin, and the rest have their own. However I'm wondering if there is a relative rank/order to these clusters? For example, are clusters 1 and 2 more similar than clusters 1 and 5? If so, then again what are the parameters defining 'clusters' over distance on the plot.

Some of my compounds are distant in terms of distance on my graph, but colour quite similarly. I'm trying to understand the relative meaning of cluster colour, and physical distance (eg: is distance a reflection of structural similarity, and colour a reflection of chemical properties? If so, what chemical properties).

Appreciate any help whatsoever!

— Reply to this email directly, view it on GitHub https://github.com/girke-lab/chemminetools/issues/242, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAKGMVHKMQUKVBJ75YDYGR3XEVNRLANCNFSM6AAAAAAXXRLBW4 . You are receiving this because you are subscribed to this thread.Message ID: @.***>

-- Thomas Girke, Ph.D. Professor of Bioinformatics Director of High-Performance Computing Center (HPCC) Director of Graduate Program in Genetics, Genomics and Bioinformatics (GGB) 1207F Genomics Building University of California Riverside, CA 92521

E-mail: @.*** URL: https://girke.bioinformatics.ucr.edu Phone/Cell/Text: 951-732-7072 Fax: 951-827-4437

Gabe232 commented 1 year ago

If done the default way on the website then the clustering is done by all-against-all structural similarity comparisons based on 2D descriptors (here atom pairs). This generates a distance (1-similarity) matrix that is used as input for the Multidimensional Scaling.(MDS). The results are presented as a scatter plot similar to a PCA plot, where point-to-point distances represent the similarities obtained in the structural comparisons. The cutoff you chose is used to create discrete clusters from the distance matrix (secondary binning clustering) to color label the data points in the MDS plot by similarity groups. In this case, the MDS clustering is based on structural similarities only and not physicochemical properties such as MW, logP etc. I hope this explains it. T. Girke On Fri, May 5, 2023 at 1:14 PM Gabe232 @.> wrote: Hi, I have a group of 16 compounds I am interested in performing some basic unbiased clustering analysis on to better understand if they sub-group in any stand-out way in terms of chemical or structural properties. I tried using the MDS clustering tool on ChemMine. I found it was very intuitive to set-up and use. However I am confused by the meaning of the readout. The FAQ portion of ChemMine did not appear to explain this. My clustering readout is a 2D plot, where I assume distance represents 'likeness', but I'm not sure what parameters are used to determine this distance (I'm guessing molecular weight is one, since the compounds roughly organize from molecular weight along the x-axis, but I'm wondering what is else factored in here). Similarly, in addition to the distance between points, the points are also coloured according to different clusters they were assigned based off of my cut-off value of 0.4. Only 3 of my compounds occupy the same cluster/bin, and the rest have their own. However I'm wondering if there is a relative rank/order to these clusters? For example, are clusters 1 and 2 more similar than clusters 1 and 5? If so, then again what are the parameters defining 'clusters' over distance on the plot. Some of my compounds are distant in terms of distance on my graph, but colour quite similarly. I'm trying to understand the relative meaning of cluster colour, and physical distance (eg: is distance a reflection of structural similarity, and colour a reflection of chemical properties? If so, what chemical properties). Appreciate any help whatsoever! — Reply to this email directly, view it on GitHub <#242>, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAKGMVHKMQUKVBJ75YDYGR3XEVNRLANCNFSM6AAAAAAXXRLBW4 . You are receiving this because you are subscribed to this thread.Message ID: @.> -- Thomas Girke, Ph.D. Professor of Bioinformatics Director of High-Performance Computing Center (HPCC) Director of Graduate Program in Genetics, Genomics and Bioinformatics (GGB) 1207F Genomics Building University of California Riverside, CA 92521 E-mail: @.*** URL: https://girke.bioinformatics.ucr.edu Phone/Cell/Text: 951-732-7072 Fax: 951-827-4437

Thanks so much for the help, Dr. Girke.

I tried using ChemMine to perform JoeLib analysis of my 16 compounds, followed by MDS clustering based on those results. I wanted to see if my PCA plot of my 16 compounds would change using JoeLib physicochemical descriptors, instead of the structural atom-atom-pair definitions you mentioned are the default for the MDS clustering of compounds in the workbench. However both of my PCA plots look the exact same. Any thoughts for what I could be doing wrong? In both cases, my compounds align along the X-axis in the order of their MW Since this is physicochemical, as you mentioned, I'm worried maybe my MDS clustering is defaulting to physicochemical properties instead of structural?

Any help is appreciated again. Thanks so much!