Too many missing genes - Githubissues

Thanks for your question. From many examples that I have seen, some genes of interest genuinely do not co-express with many other genes to form a cluster. Many of these observations suggest that they might be co-operating at the proteomic level rather than the transcriptional regulatory level.

However, to get clusters that are less strict, you can reduce the value of the -t parameter. By default, -t is 1.0; if you set it to smaller values (e.g. 0.5 or 0.1 or even 0.0) you get clusters with larger numbers of genes in them but less tight. If you use larger values of -t (e.g. 2.0, 5.0, or 10.0), you get tighter clusters. Try adding to your running command -t 0.5 for example and see if this solves your problem.

Regarding forcing clust with a fixed K value, it is not an option as per how clust operates. I wouldn't call the K value from mclust as "absolute optimum", but rather it is "optimum according to the criteria of mclust". Similarly, clust aims at identifying the optimum K values automatically as per its criteria. It is hard to claim an absolutely optimum K values a priori unless there is a very strong evidence from the domain knowledge or if the data was synthesised as such. Many other clustering algorithms automatically identify their "optimum K value" such as WGCNA, MCA, and cross-clustering. Which K value is the correct one? The answer is that each one of them is correct as per the criteria that were used to find it.

The philosophy of clust is that it tries to extract optimum clusters that are "really co-expressed", that is, their expression profiles are highly correlated, out of noisy datasets. My opinion is that this should reduce the loads of false positives you get from clustering algorithms that generate large but loose clusters, which might not be seen as co-expressed by manual inspection.

I hope that this discussion helps you in what you are trying to achieve.

All the best and please feel free to come back to me with any further questions :)

Basel

BaselAbujamous / clust

Too many missing genes #43

clust Data/ -r Replicates.txt -n Normalisation.txt -cs 5