BaselAbujamous / clust

Automatic and optimised consensus clustering of one or more heterogeneous datasets
Other
161 stars 36 forks source link

Error during data pre-processing: bool' object is not iterable #24

Closed pilarcormo closed 5 years ago

pilarcormo commented 5 years ago

Hi,

I'm using clust for the first time. Using tpm values to build my clusters. I'm using Python 2.7.14 and I'm running it in HPC. I get this error:

| Analysis started at: Tuesday 15 January 2019 (09:52:58) | | 1. Reading dataset(s) | | 2. Data pre-processing | Traceback (most recent call last): File "/nbi/Research-Groups/JIC/Diane-Saunders/Anaconda/Installation/bin/clust", line 11, in sys.exit(main()) File "/Anaconda/Installation/lib/python2.7/site-packages/clust/main.py", line 98, in main args.cs, args.np, args.optimisation, args.q3s, args.basemethods, args.deterministic) File "/Anaconda/Installation/lib/python2.7/site-packages/clust/clustpipeline.py", line 102, in clustpipeline filteringtype=filteringtype, filterflat=filflat, params=None, datafiles=datafiles) File "/Anaconda/Installation/lib/python2.7/site-packages/clust/scripts/preprocess_data.py", line 630, in preprocess Xproc[l] = fixnans(Xproc[l]) File "/Anaconda/Installation/lib/python2.7/site-packages/clust/scripts/preprocess_data.py", line 70, in fixnans sumnans = sum(isnan(Xinloc[i])) TypeError: 'bool' object is not iterable

Any ideas why this might be?

Thanks

BaselAbujamous commented 5 years ago

Hi Pilar

Thanks for reporting this. It's a strange error and I am not exactly sure why you might get this, but will make sure it is resolved for you so clust runs successfully for you.

Two questions:

  1. What was the command you used to run clust? Is it simply: clust dataset_file
  2. What is the format of your data file?

Best wishes Basel

pilarcormo commented 5 years ago

Hi Basel,

Thanks so much for getting back to me so quickly.

  1. My command line is clust dataset_file -r replicates-file.txt -o output_file -n 101 4
  2. My data files are tab separated, with the gene names in the first column and the tpm values in the second one
BaselAbujamous commented 5 years ago

Hi Pilar

I know why you are getting this error :)

Your dataset has a single sample only (one column of TPM values). Clustering is not really applicable in principle to one-sample datasets as a single sample does not suffice to make patterns or profiles of gene expression. This error is also explained here:

https://github.com/BaselAbujamous/clust/issues/14

If you have multiple samples or you need further assistance in designing your experiment please don't hesitate to let me know the details :)

All the best Basel

pilarcormo commented 5 years ago

Hi Basel,

Thanks so much. That makes sense. I'll change the structure of my data and try again.

Pilar

pilarcormo commented 5 years ago

Hi again,

I changed my input files, now every file has between 5 and 12 samples and I'm getting exactly the same error message. Any ideas of something else I should change?

Thanks

Pilar

BaselAbujamous commented 5 years ago

So each file has the names of the genes in the first column, followed by 5 to 12 columns for the 5 to 12 samples, and the first row of the file is a header with the titles of the columns? Then you ran clust as:

clust dataset_file -r replicates-file.txt -o output_file -n 101 4

Correct?

pilarcormo commented 5 years ago

So I'm running:

clust dataset_folder -r replicates-file.txt -o output_file -n 101 3 4

dataset_folder is where the 3 files with the samples' tpm values are. I added the -n 3 because I have RNA-seq TPM data

BaselAbujamous commented 5 years ago

I can't see why the same error would appear in this case if that folder only has those three dataset files. If you like to send the data files or one of them confidentially to my email basel.abujamous@plants.ox.ac.uk I can check why this would have happened. If you like you can replace gene names with any other anonymous labels if you are concerned about protecting the confidentiality of the data.

Otherwise, if you post the first few lines from each file here I may be able to detect the cause of the problem.

Again, I would like to help until clust is running successfully for you.

BW Basel