BergmannLab / MONET

MONET : MOdularising NEtwork Toolbox - https://doi.org/10.1093/bioinformatics/btaa236
GNU General Public License v3.0
41 stars 15 forks source link

maximum number of genes in cluster for K1 #54

Open jpinero opened 2 years ago

jpinero commented 2 years ago

Does it make sense to add a new parameter that allows manipulating the cluster size in the DSD method (K1)? Is there any rationale (or reference) behind setting a maximum number of genes in a cluster to 100? thanks!

mattiat commented 2 years ago

Hello, what you are looking for is the --nclusters parameter. Please check "OPTIONAL PARAMETERS" in the README file or the command line help function. Best Mattia

jpinero commented 2 years ago

Thanks for your answer! I had indeed checked the "OPTIONAL PARAMETERS" section in the README file, and after reading the parameters for K1 my intuition was the --nclusters parameter was not what I was looking for. After changing the --nclusters to 200, I have relatively larger clusters, but the maximum number of genes in a cluster is still 100. What I would like to have are clusters that have more than 100 genes, and from the documentation, I think that I cannot manipulate this number, right? cheers janet

jjc2718 commented 2 years ago

@jpinero that's correct, the --n_clusters parameter controls the number of centroids used in the spectral clustering portion of the algorithm, so it sounds like it's not what you want.

There isn't currently an easy command-line way to change the maximum cluster size (the original challenge required a limit of 100 genes and we never needed to modify it), but you can change it by editing the source code here: https://github.com/BergmannLab/MONET/blob/e090e3252ae595933da848e81b8551a6710b27e6/.src/K1_code/clustering/split_clusters.py#L36

If you edit your version of the code to increase that number, you should get larger clusters. Alternatively, you can completely skip the recursive clustering step (which will give you no limit on cluster size) by 1) commenting out this line here

https://github.com/BergmannLab/MONET/blob/e090e3252ae595933da848e81b8551a6710b27e6/.src/K1_code/runTusk.sh#L60

and 2) editing this line to point to ./data/cluster_results/network_clusters.txt instead of ./data/cluster_results/network_clusters_split.txt

https://github.com/BergmannLab/MONET/blob/e090e3252ae595933da848e81b8551a6710b27e6/.src/K1_code/runTusk.sh#L63

You may have to rerun the install script using the directions in the README to get your local changes to have an effect. Sorry this is a bit complicated - let me know if you run into any difficulties, and I can try to help.

I agree that it would be good to have this option as a command-line parameter as well. I'll try to add one in the next week or two, if I can find the time.