Closed johnsolk closed 5 years ago
I've tried to code around this problem by assigning OG to transcript ID from OrthoFinder output by hand. There are still missing genes, and this error occurs. I think similar to the bug identified in #28.
Command:
clust species_expression -d 16 -r species_replicates
Output
/===========================================================================\
| Clust |
| (Optimised consensus clustering of multiple heterogenous datasets) |
| Python package version 1.8.10 (2018) Basel Abu-Jamous |
+---------------------------------------------------------------------------+
| Analysis started at: Friday 25 January 2019 (23:19:29) |
| 1. Reading dataset(s) |
| 2. Data pre-processing |
| - Automatic normalisation mode (default in v1.7.0+). |
| Clust automatically normalises your dataset(s). |
| To switch it off, use the `-n 0` option (not recommended). |
| Check https://github.com/BaselAbujamous/clust for details. |
Traceback (most recent call last):
File "/opt/miniconda3/envs/run_clust/bin/clust", line 11, in <module>
sys.exit(main())
File "/opt/miniconda3/envs/run_clust/lib/python2.7/site-packages/clust/__main__.py", line 98, in main
args.cs, args.np, args.optimisation, args.q3s, args.basemethods, args.deterministic)
File "/opt/miniconda3/envs/run_clust/lib/python2.7/site-packages/clust/clustpipeline.py", line 102, in clustpipeline
filteringtype=filteringtype, filterflat=filflat, params=None, datafiles=datafiles)
File "/opt/miniconda3/envs/run_clust/lib/python2.7/site-packages/clust/scripts/preprocess_data.py", line 675, in preprocess
(Xproc[l], codes) = normaliseSampleFeatureMat(Xproc[l], normaliseloc[l])
File "/opt/miniconda3/envs/run_clust/lib/python2.7/site-packages/clust/scripts/preprocess_data.py", line 273, in normaliseSampleFeatureMat
Xout, codesi = normaliseSampleFeatureMat(Xout, type[i])
File "/opt/miniconda3/envs/run_clust/lib/python2.7/site-packages/clust/scripts/preprocess_data.py", line 363, in normaliseSampleFeatureMat
codes = autoNormalise(Xout)
File "/opt/miniconda3/envs/run_clust/lib/python2.7/site-packages/clust/scripts/preprocess_data.py", line 213, in autoNormalise
Xl = normaliseSampleFeatureMat(Xloc, [3])[0] # index 1 (Xloc, i.e. original X is index 0)
File "/opt/miniconda3/envs/run_clust/lib/python2.7/site-packages/clust/scripts/preprocess_data.py", line 273, in normaliseSampleFeatureMat
Xout, codesi = normaliseSampleFeatureMat(Xout, type[i])
File "/opt/miniconda3/envs/run_clust/lib/python2.7/site-packages/clust/scripts/preprocess_data.py", line 295, in normaliseSampleFeatureMat
Xout[ind1] = fixnans(Xout[ind1])
File "/opt/miniconda3/envs/run_clust/lib/python2.7/site-packages/clust/scripts/preprocess_data.py", line 70, in fixnans
sumnans = sum(isnan(Xinloc[i]))
TypeError: 'bool' object is not iterable
Thanks again, Lisa, for reporting this bug. Found the issue and fixed it. Try install clust version 1.8.11 (the latest version) and it should work :)
Looking forward to your next blog post :)
If any further problems appear please let me know.
All the best Basel
Hi again, I have tested clust on your data, which is taking a long time, but it's okay.
Clust exited at another error which is due to the fact that one of the 17 datasets has one condition only, which is the "F_notti.tsv" dataset. The replicates file shows that this dataset has two samples that are replicates of a single condition. So when the two replicates are summarised, the dataset will have a single column of data. Clustering doesn't really make sense over a single condition (single dimension). This error is explained in issue #14 .
Possible solutions:
Best! Basel
Thanks, @BaselAbujamous! I did update to version 1.8.11. However, I get this error below now.
Here are re-formatted files with OG assigned by hand, if you would like to take a look:
curl -L https://osf.io/cbfst/download -o species_replicates
curl -L https://osf.io/muxaf/download -o species-expression-OG.tar.gz
tar -xvzf species-expression.tar.gz
Command:
clust species_expression -d 16 -r species_replicates
Output:
/===========================================================================\
| Clust |
| (Optimised consensus clustering of multiple heterogenous datasets) |
| Python package version 1.8.11 (2018) Basel Abu-Jamous |
+---------------------------------------------------------------------------+
| Analysis started at: Saturday 26 January 2019 (19:21:30) |
| 1. Reading dataset(s) |
| 2. Data pre-processing |
| - Automatic normalisation mode (default in v1.7.0+). |
| Clust automatically normalises your dataset(s). |
| To switch it off, use the `-n 0` option (not recommended). |
| Check https://github.com/BaselAbujamous/clust for details. |
Traceback (most recent call last):
File "/opt/miniconda3/envs/run_clust/bin/clust", line 11, in <module>
sys.exit(main())
File "/opt/miniconda3/envs/run_clust/lib/python2.7/site-packages/clust/__main__.py", line 98, in main
args.cs, args.np, args.optimisation, args.q3s, args.basemethods, args.deterministic)
File "/opt/miniconda3/envs/run_clust/lib/python2.7/site-packages/clust/clustpipeline.py", line 102, in clustpipeline
filteringtype=filteringtype, filterflat=filflat, params=None, datafiles=datafiles)
File "/opt/miniconda3/envs/run_clust/lib/python2.7/site-packages/clust/scripts/preprocess_data.py", line 675, in preprocess
(Xproc[l], codes) = normaliseSampleFeatureMat(Xproc[l], normaliseloc[l])
File "/opt/miniconda3/envs/run_clust/lib/python2.7/site-packages/clust/scripts/preprocess_data.py", line 273, in normaliseSampleFeatureMat
Xout, codesi = normaliseSampleFeatureMat(Xout, type[i])
File "/opt/miniconda3/envs/run_clust/lib/python2.7/site-packages/clust/scripts/preprocess_data.py", line 363, in normaliseSampleFeatureMat
codes = autoNormalise(Xout)
File "/opt/miniconda3/envs/run_clust/lib/python2.7/site-packages/clust/scripts/preprocess_data.py", line 213, in autoNormalise
Xl = normaliseSampleFeatureMat(Xloc, [3])[0] # index 1 (Xloc, i.e. original X is index 0)
File "/opt/miniconda3/envs/run_clust/lib/python2.7/site-packages/clust/scripts/preprocess_data.py", line 273, in normaliseSampleFeatureMat
Xout, codesi = normaliseSampleFeatureMat(Xout, type[i])
File "/opt/miniconda3/envs/run_clust/lib/python2.7/site-packages/clust/scripts/preprocess_data.py", line 295, in normaliseSampleFeatureMat
Xout[ind1] = fixnans(Xout[ind1])
File "/opt/miniconda3/envs/run_clust/lib/python2.7/site-packages/clust/scripts/preprocess_data.py", line 70, in fixnans
sumnans = sum(isnan(Xinloc[i]))
TypeError: 'bool' object is not iterable
Hi,
The problem that was fixed in the new version was that related to reading the missing orthologues from the orthogroups file. However, this other error that you have just reported is the one that I talked about in my last comment above related to the dataset "F_notti.tsv".
Your data seems to properly test clust for multiple species! I like that! These iterations will make it robust.
Thanks and all the best! Basel
Hi again :)
I have found another bug related to analysing your data. I believe I have fixed it. It is being tested on your data now before releasing version 1.8.12.
Hi one more time.
Now in version 1.8.12 another bug has been fixed. This was caused by the fact that I removed the line on the "F_notti.tsv" dataset from the replicates file (for the reasons explained few comments above).
It should work now. Happy to follow it up with any further questions or discussions indeed.
All the best Basel
Hi. I believe this issue has been resolved so I am closing it. Please feel free to reopen it or to submit any other issue.
All the best Basel
Hello @BaselAbujamous, thank you for providing this capacity for looking at gene expression data from multiple species! This is perfect for my project, and I'm very excited about clust.
I ran into a problem with the following command using Orthofinder output:
The error is below. Can you please tell me if there is a problem with my formatting?
Here are the files used for this run:
Output error:
On an Ubuntu 18.04 instance, Conda py2.7 environment