BaselAbujamous / clust

Automatic and optimised consensus clustering of one or more heterogeneous datasets
Other
160 stars 35 forks source link

drastically different results between version 1.8.10 and version 1.12.0 #64

Open ereverman opened 3 years ago

ereverman commented 3 years ago

I reanalyzed some data I originally analyzed with version 1.8.10 and am not able to recapitulate those results with version 1.12.0. I see a warning message at the 80% point of step 3 of the analysis in 1.12.0 but it doesn't seem relevant to my issue or to clust: "FutureWarning: 'n_jobs' was deprecated in version 0.23 and will be removed in 0.25."

Do you have a sense of what might be leading to the different results? My implementation of clust is straightforward:

clust input_file -o output_dir

Thanks, Elizabeth

MiguelCos commented 3 years ago

I just noticed a similar behavior between version 1.10 and 1.12.

The exact same input data produce totally different results.

Version 1.10 produces 6 clusters from 299 proteins while version 1.12 only finds 1 cluster.

I would really appreciate any input!

Let me know if you require any additional information.

Best, Miguel

Result summary, version 1.10

PS C:\Users\migue\OneDrive\Documentos\R_Projects\Support\Patrick_Pigs_data_analysis> clust .\clust_data\

/===========================================================================\
|                                   Clust                                   |
|    (Optimised consensus clustering of multiple heterogenous datasets)     |
|          Python package version 1.10.10 (2019) Basel Abu-Jamous           |
+---------------------------------------------------------------------------+
| Analysis started at: Friday 04 December 2020 (09:46:55)                   |
| 1. Reading dataset(s)                                                     |
| 2. Data pre-processing                                                    |
C:\Program Files\WPy64-3770\python-3.7.7.amd64\lib\site-packages\clust\scripts\preprocess_data.py:19: RuntimeWarning: invalid value encountered in greater
  I = np.bitwise_and(~isnan(X), X>0)
|  - Automatic normalisation mode (default in v1.7.0+).                     |
|    Clust automatically normalises your dataset(s).                        |
|    To switch it off, use the `-n 0` option (not recommended).             |
|    Check https://github.com/BaselAbujamous/clust for details.             |
|  - Flat expression profiles filtered out (default in v1.7.0+).            |
|    To switch it off, use the --no-fil-flat option (not recommended).      |
|    Check https://github.com/BaselAbujamous/clust for details.             |                        |
| C:\Users\migue\OneDrive\Documentos\R_Projects\Support\Patrick_Pigs_data_a |
| nalysis/Results_04_Dec_20                                                 |
+---------------------------------------------------------------------------+
| Analysis finished at: Friday 04 December 2020 (09:46:59)                  |
| Total time consumed: 0 hours, 0 minutes, and 4 seconds                    |
|                                                                           |
\===========================================================================/

/===========================================================================\
|                              RESULTS SUMMARY                              |
+---------------------------------------------------------------------------+
| Clust received 1 dataset with 299 unique genes. After filtering, 299      |
| genes made it to the clustering step. Clust generated 6 clusters of       |
| genes, which in total include 205 genes. The smallest cluster includes    |
| 13 genes, the largest cluster includes 67 genes, and the average cluster  |
| size is 34 genes.                                                         |
================================/

Result summary on the same data, Clust version 1.12:


PS C:\Users\migue\OneDrive\Documentos\R_Projects\Support\Patrick_Pigs_data_analysis> clust .\clust_data\

/===========================================================================\
|                                   Clust                                   |
|    (Optimised consensus clustering of multiple heterogenous datasets)     |
|           Python package version 1.12.0 (2019) Basel Abu-Jamous           |
+---------------------------------------------------------------------------+
| Analysis started at: Friday 04 December 2020 (10:15:55)                   |
| 1. Reading dataset(s)                                                     |
| 2. Data pre-processing                                                    |
C:\Users\migue\AppData\Roaming\Python\Python37\site-packages\clust\scripts\preprocess_data.py:19: RuntimeWarning: invalid value encountered in greater
  I = np.bitwise_and(~isnan(X), X>0)
|  - Automatic normalisation mode (default in v1.7.0+).                     |
|    Clust automatically normalises your dataset(s).                        |
|    To switch it off, use the `-n 0` option (not recommended).             |
|    Check https://github.com/BaselAbujamous/clust for details.             |
|  - Flat expression profiles filtered out (default in v1.7.0+).            |
|    To switch it off, use the --no-fil-flat option (not recommended).      |
|    Check https://github.com/BaselAbujamous/clust for details.             |
| 3. Seed clusters production (the Bi-CoPaM method)                         |
| C:\Users\migue\OneDrive\Documentos\R_Projects\Support\Patrick_Pigs_data_a |
| nalysis/Results_04_Dec_20_1                                               |
+---------------------------------------------------------------------------+
| Analysis finished at: Friday 04 December 2020 (10:15:59)                  |
| Total time consumed: 0 hours, 0 minutes, and 4 seconds                    |
|                                                                           |
\===========================================================================/

/===========================================================================\
|                              RESULTS SUMMARY                              |
+---------------------------------------------------------------------------+
| Clust received 1 dataset with 299 unique genes. After filtering, 299      |
| genes made it to the clustering step. Clust generated 1 clusters of       |
| genes, which in total include 67 genes. The smallest cluster includes 67  |
| genes, the largest cluster includes 67 genes, and the average cluster     |
| size is 67 genes.                                                         |
\===========================================================================/
TommyPhannareth commented 3 years ago

Just chiming in that I'm experience similar replication issues between version 1.10 and 1.12, a drastic reduction in the number of clusters.

sagarutturkar commented 3 years ago

I also had similar issue, v1.12 gets only a single cluster in two different datasets.