BaselAbujamous / clust

Automatic and optimised consensus clustering of one or more heterogeneous datasets
Other
160 stars 35 forks source link

Too many genes not included in any cluster #44

Closed LQHHHHH closed 5 years ago

LQHHHHH commented 5 years ago

Hi, I am using clust to analyze my two datasets from same species, but I found that only 6,000 genes were clustered and the rest of the genes were not included (40,000+ genes in total), I tried adjusting the -t option. When t=0.1, only 10,000+ genes were clustered. The following is part of my Summary.tsv file contents when t=1.

Starting data and time Saturday 20 April 2019 (13:29:50)
Ending date and time Saturday 20 April 2019 (13:42:59)
Time consumed 0 hours, 13 minutes, and 8 seconds
Number of datasets 2
Total number of input genes 46458
Genes included in the analysis 36447
Genes filtered out from the analysis 10011
Number of clusters 20
Total number of genes in clusters 6049
Genes not included in any cluster 30398

Do you have any suggestion to increase gene number in clusters, many thanks.

BaselAbujamous commented 5 years ago

Hi. Thanks for using clust and for your question.

Clust aims at extracting tight and optimum clusters without trying to force input genes to be included in the output. It might make sense to get 10,000 genes out of the 40,000+ genes in total, as forcing the rest to be in some clusters would be considered as adding noise to the clusters. However, you may still like to widen your clusters beyond what -t=0.1 gives. Probably try increasing the -q3s parameter to 3.0 for example (default is 2.0), or try to reduce -t to 0.0 (its absolute allowed minimum value).

Please let me know if any more help is needed.

Best Basel