SofieVG / FlowSOM

Using self-organizing maps for visualization and interpretation of cytometry data
61 stars 26 forks source link

Clusters with similar marker expression level are not clustered in the same metacluster #62

Open ha-le-git opened 1 year ago

ha-le-git commented 1 year ago

Hello Sophie!

Thank you for a great package!

I have a question regarding the optimization of flowSOM clustering. I set a seed, used an 11x12 grid, and varied the nClus parameter to test a different number of metaclusters. One issue I observed is that 2 clusters that are next to each other on the Minimal spanning tree (MST) AND have very similar marker expression profiles can be clustered in 2 different metaclusters. On the other hand, clusters that are further on MST can be assigned in the same metacluster. For example in the figure I attached below: Metaclusters 1,2,10,14 are split into very distanced parts on MST.

I'm wondering what can be the possible explanation for this issue, and how should I optimize the parameters to get better clustering. (I tried to set maxMeta very high to get more well-defined metaclusters but the issue still remained, and the number of metacluster returned was too high for annotation)

Thank you so much and I'm looking forward to your response!

image

Best regards,

Ha Le

SamGG commented 1 year ago

Hi, Could you provide the heatmap of the 132 clusters, masking the marker names? Best.

SofieVG commented 1 year ago

Hi Ha Le,

This can indeed happen sometimes. The first thing to understand is that a minimal spanning tree by definition cannot have any loops, so even nodes that were quite close in the high dimensional space, can be far apart in the tree. The only thing that the tree tells you in that case is that another node was even closer. Secondly, the discrepancy with the metaclustering is probably caused because there we use a different algorithm: hierarchical clustering with average linkage. This means when combining the clusters, it will consider an average distance between multiple nodes rather than just the closest node. Single linkage might give you results that correspond better to what the tree shows. However, when looking at the marker expression (eg by looking at the heatmap as suggested by Samuel), we often see that there is good reason for such a metaclustering, even when it doesn't look intuitive on the tree. It's just part of the limitations of visualizing high dimensional data in 2D. So I would recommend checking the heatmaps and scatterplots and deciding your final metaclusters based on that information. You can also adapt your metaclustering labels with UpdateMetaclusters manually if need be.

Hope this helps, Sofie

On Tue, 11 Apr 2023, 20:18 Samuel Granjeaud, @.***> wrote:

Hi, Could you provide the heatmap of the 132 clusters, masking the marker names? Best.

— Reply to this email directly, view it on GitHub https://github.com/SofieVG/FlowSOM/issues/62#issuecomment-1503875975, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAOS72ZVHFJNEMCVR7YDTODXAWN7NANCNFSM6AAAAAAW2ICNJI . You are receiving this because you are subscribed to this thread.Message ID: @.***>