gene-level module membership score

mattiat commented 4 years ago

Currently, the methods produce "hard" clustering results, i.e. binary assignment of genes to modules.

POTENTIAL FEATURE: optionally output some kind of module membership score (like the correlation-based kME score for WGCNA or the posterior prob of a mixture model)

sergio-gomez commented 4 years ago

If this just means to calculate a score (or several ones) for each gene, given the modules currently found by the implemented methods, then it seems OK for me.

On the contrary, if the idea is to modify the algorithms to make them provide those scores, then this is much more difficult and deviates from the purpose of MONET: implementing the best-performing methods of the DREAM challenge.

Thus, I endorse the first option and discourage the second one.

Anyway, I believe the optimal approach to obtain membership scores is to use community detection methods that are specifically designed to do so, which means using completely different algorithms. I mean, I'm not sure about the quality of the scores we could add on top of the current K1, M1 and R1.

DanielMedic commented 4 years ago

Thanks for opening this issue. I'm the user who inquired about the feature's availability. I absolutely agree that if the feature doesn't exist in the original algorithms, it would be a mistake to try to shoehorn it in. Because the methods all use the same similarity matrix as input, I'm going to try calculating a post hoc score as the mean of similarities for each gene to other genes in its module (possibly geometric mean; I'll have to look at the distributions). If that works out well, and you decide you want to include it as a feature, I'll be happy to pass along the relevant R code.

DanielMedic commented 4 years ago

I've implemented a membership calculation for our data. We're using topological overlap matrices (TOMs) as used in WGCNA for our similarity matrices. Here's what I wrote in the README:

Membership functions are calculated as follows. For each dataset, and for each gene in each cluster, a gene's raw membership is defined as the mean of its TOM value with all genes in the dataset including itself, where a gene's TOM with itself is always 1. For example, supppose a gene is in a cluster with three other genes with which its TOM values are 0.1, 0.05, and 0.01. Then its raw membership score in the cluster is (1 + 0.1 + 0.05 + 0.01) / 4 = 0.29. One implication of this is that memberships in singleton clusters, i.e. clusters containing only one gene, are always 1.

Because TOM values are often very low, membership values can be quite low too. Thus the final membership score is calculated by dividing the raw memberships across by their maximum value across the entire dataset, excluding singletons. (Singleton values are then set back to 1.) This helps bring the scores in line with those seen in other cluster membership measures such as kME in WGCNA or posterior probabilities in mixture models.

If this sounds like something you'd like to include as an option in MONET, let me know and I'll post the code. :)

mattiat commented 4 years ago

Thank you, Daniel! I sent you an invitation as a collaborator. I am very happy to open this project to the community!

BergmannLab / MONET

gene-level module membership score #11