BaselAbujamous / clust

Automatic and optimised consensus clustering of one or more heterogeneous datasets
Data pre-processing, AttributeError: 'float' object has no attribute 'replace' #29

johnsolk commented 5 years ago

Hello @BaselAbujamous, thank you for providing this capacity for looking at gene expression data from multiple species! This is perfect for my project, and I'm very excited about clust.

I ran into a problem with the following command using Orthofinder output:

clust species-expression -d 17 -r species_replicates -m Orthogroups.csv 

The error is below. Can you please tell me if there is a problem with my formatting?

Here are the files used for this run:

curl -L -o Orthogroups.csv
curl -L -o species_replicates
curl -L -o species-expression.tar.gz
tar -xvzf species-expression.tar.gz

Output error:

|                                   Clust                                   |
|    (Optimised consensus clustering of multiple heterogenous datasets)     |
|           Python package version 1.8.10 (2018) Basel Abu-Jamous           |
| Analysis started at: Friday 25 January 2019 (00:16:46)                    |
| 1. Reading dataset(s)                                                     |
| 2. Data pre-processing                                                    |
Traceback (most recent call last):
  File "/opt/miniconda3/envs/run_clust/bin/clust", line 11, in <module>
  File "/opt/miniconda3/envs/run_clust/lib/python2.7/site-packages/clust/", line 98, in main
    args.cs,, args.optimisation, args.q3s, args.basemethods, args.deterministic)
  File "/opt/miniconda3/envs/run_clust/lib/python2.7/site-packages/clust/", line 97, in clustpipeline
  File "/opt/miniconda3/envs/run_clust/lib/python2.7/site-packages/clust/scripts/", line 436, in calculateGDMandUpdateDatasets
    OGsFirstColMap, delimGenesInMap)
  File "/opt/miniconda3/envs/run_clust/lib/python2.7/site-packages/clust/scripts/", line 397, in mapGenesToCommonIDs
    Maploc[i, j] = re.split(delimGenesInMap, Maploc[i, j].replace('.', 'thisisadot').replace('-', 'thisisadash').replace('/', 'thisisaslash'))
AttributeError: 'float' object has no attribute 'replace'

On an Ubuntu 18.04 instance, Conda py2.7 environment

johnsolk commented 5 years ago

I've tried to code around this problem by assigning OG to transcript ID from OrthoFinder output by hand. There are still missing genes, and this error occurs. I think similar to the bug identified in #28.


clust species_expression -d 16 -r species_replicates


|                                   Clust                                   |
|    (Optimised consensus clustering of multiple heterogenous datasets)     |
|           Python package version 1.8.10 (2018) Basel Abu-Jamous           |
| Analysis started at: Friday 25 January 2019 (23:19:29)                    |
| 1. Reading dataset(s)                                                     |
| 2. Data pre-processing                                                    |
|  - Automatic normalisation mode (default in v1.7.0+).                     |
|    Clust automatically normalises your dataset(s).                        |
|    To switch it off, use the `-n 0` option (not recommended).             |
|    Check for details.             |
Traceback (most recent call last):
  File "/opt/miniconda3/envs/run_clust/bin/clust", line 11, in <module>
  File "/opt/miniconda3/envs/run_clust/lib/python2.7/site-packages/clust/", line 98, in main
    args.cs,, args.optimisation, args.q3s, args.basemethods, args.deterministic)
  File "/opt/miniconda3/envs/run_clust/lib/python2.7/site-packages/clust/", line 102, in clustpipeline
    filteringtype=filteringtype, filterflat=filflat, params=None, datafiles=datafiles)
  File "/opt/miniconda3/envs/run_clust/lib/python2.7/site-packages/clust/scripts/", line 675, in preprocess
    (Xproc[l], codes) = normaliseSampleFeatureMat(Xproc[l], normaliseloc[l])
  File "/opt/miniconda3/envs/run_clust/lib/python2.7/site-packages/clust/scripts/", line 273, in normaliseSampleFeatureMat
    Xout, codesi = normaliseSampleFeatureMat(Xout, type[i])
  File "/opt/miniconda3/envs/run_clust/lib/python2.7/site-packages/clust/scripts/", line 363, in normaliseSampleFeatureMat
    codes = autoNormalise(Xout)
  File "/opt/miniconda3/envs/run_clust/lib/python2.7/site-packages/clust/scripts/", line 213, in autoNormalise
    Xl = normaliseSampleFeatureMat(Xloc, [3])[0]  # index 1  (Xloc, i.e. original X is index 0)
  File "/opt/miniconda3/envs/run_clust/lib/python2.7/site-packages/clust/scripts/", line 273, in normaliseSampleFeatureMat
    Xout, codesi = normaliseSampleFeatureMat(Xout, type[i])
  File "/opt/miniconda3/envs/run_clust/lib/python2.7/site-packages/clust/scripts/", line 295, in normaliseSampleFeatureMat
    Xout[ind1] = fixnans(Xout[ind1])
  File "/opt/miniconda3/envs/run_clust/lib/python2.7/site-packages/clust/scripts/", line 70, in fixnans
    sumnans = sum(isnan(Xinloc[i]))
TypeError: 'bool' object is not iterable
BaselAbujamous commented 5 years ago

Thanks again, Lisa, for reporting this bug. Found the issue and fixed it. Try install clust version 1.8.11 (the latest version) and it should work :)

Looking forward to your next blog post :)

If any further problems appear please let me know.

All the best Basel

BaselAbujamous commented 5 years ago

Hi again, I have tested clust on your data, which is taking a long time, but it's okay.

Clust exited at another error which is due to the fact that one of the 17 datasets has one condition only, which is the "F_notti.tsv" dataset. The replicates file shows that this dataset has two samples that are replicates of a single condition. So when the two replicates are summarised, the dataset will have a single column of data. Clustering doesn't really make sense over a single condition (single dimension). This error is explained in issue #14 .

Possible solutions:

  1. To remove the row related to this dataset in the replicates file, so clust will automatically treat the two samples in this dataset as two independent samples (I am testing it now).
  2. To exclude this particular dataset from analysis, as it does not have sufficient complexity for cluster analysis.

Best! Basel

johnsolk commented 5 years ago

Thanks, @BaselAbujamous! I did update to version 1.8.11. However, I get this error below now.

Here are re-formatted files with OG assigned by hand, if you would like to take a look:

curl -L -o species_replicates
curl -L -o species-expression-OG.tar.gz
tar -xvzf species-expression.tar.gz


 clust species_expression -d 16 -r species_replicates


|                                   Clust                                   |
|    (Optimised consensus clustering of multiple heterogenous datasets)     |
|           Python package version 1.8.11 (2018) Basel Abu-Jamous           |
| Analysis started at: Saturday 26 January 2019 (19:21:30)                  |
| 1. Reading dataset(s)                                                     |
| 2. Data pre-processing                                                    |
|  - Automatic normalisation mode (default in v1.7.0+).                     |
|    Clust automatically normalises your dataset(s).                        |
|    To switch it off, use the `-n 0` option (not recommended).             |
|    Check for details.             |
Traceback (most recent call last):
  File "/opt/miniconda3/envs/run_clust/bin/clust", line 11, in <module>
  File "/opt/miniconda3/envs/run_clust/lib/python2.7/site-packages/clust/", line 98, in main
    args.cs,, args.optimisation, args.q3s, args.basemethods, args.deterministic)
  File "/opt/miniconda3/envs/run_clust/lib/python2.7/site-packages/clust/", line 102, in clustpipeline
    filteringtype=filteringtype, filterflat=filflat, params=None, datafiles=datafiles)
  File "/opt/miniconda3/envs/run_clust/lib/python2.7/site-packages/clust/scripts/", line 675, in preprocess
    (Xproc[l], codes) = normaliseSampleFeatureMat(Xproc[l], normaliseloc[l])
  File "/opt/miniconda3/envs/run_clust/lib/python2.7/site-packages/clust/scripts/", line 273, in normaliseSampleFeatureMat
    Xout, codesi = normaliseSampleFeatureMat(Xout, type[i])
  File "/opt/miniconda3/envs/run_clust/lib/python2.7/site-packages/clust/scripts/", line 363, in normaliseSampleFeatureMat
    codes = autoNormalise(Xout)
  File "/opt/miniconda3/envs/run_clust/lib/python2.7/site-packages/clust/scripts/", line 213, in autoNormalise
    Xl = normaliseSampleFeatureMat(Xloc, [3])[0]  # index 1  (Xloc, i.e. original X is index 0)
  File "/opt/miniconda3/envs/run_clust/lib/python2.7/site-packages/clust/scripts/", line 273, in normaliseSampleFeatureMat
    Xout, codesi = normaliseSampleFeatureMat(Xout, type[i])
  File "/opt/miniconda3/envs/run_clust/lib/python2.7/site-packages/clust/scripts/", line 295, in normaliseSampleFeatureMat
    Xout[ind1] = fixnans(Xout[ind1])
  File "/opt/miniconda3/envs/run_clust/lib/python2.7/site-packages/clust/scripts/", line 70, in fixnans
    sumnans = sum(isnan(Xinloc[i]))
TypeError: 'bool' object is not iterable
BaselAbujamous commented 5 years ago


The problem that was fixed in the new version was that related to reading the missing orthologues from the orthogroups file. However, this other error that you have just reported is the one that I talked about in my last comment above related to the dataset "F_notti.tsv".

Your data seems to properly test clust for multiple species! I like that! These iterations will make it robust.

Thanks and all the best! Basel

BaselAbujamous commented 5 years ago

Hi again :)

I have found another bug related to analysing your data. I believe I have fixed it. It is being tested on your data now before releasing version 1.8.12.

BaselAbujamous commented 5 years ago

Hi one more time.

Now in version 1.8.12 another bug has been fixed. This was caused by the fact that I removed the line on the "F_notti.tsv" dataset from the replicates file (for the reasons explained few comments above).

It should work now. Happy to follow it up with any further questions or discussions indeed.

All the best Basel

BaselAbujamous commented 5 years ago

Hi. I believe this issue has been resolved so I am closing it. Please feel free to reopen it or to submit any other issue.

All the best Basel