Closed emvcaest closed 5 years ago
Hi Emmelien,
Thanks for using Clust and for your question. Problem identified :)
The header line in the data file GSE17237.txt starts as:
ID TAB TAB GSM431528 TAB GSM431529 TAB GSM431530 ... etc.
There are two TAB spaces after "ID" and before the title of the first column. So the method thinks your file has one more column than reality. Just remove this extra TAB after "ID" and see how things go :)
Best wishes! Basel
Hi Basel,
That was indeed the problem, thanks for the help and sorry for bothering you with such a stupid mistake on my side.
Emmelien
No worries, Emmelien. So I will close this issue, and please feel free to come back with any other questions or issues.
All the best Basel
Hi Basel,
First of all, thanks for building this tool. I have already used this to process several RNASeq, which went effortless.
However, right now I would like to re-process public micro-array data available on GEO (such as https://www.ncbi.nlm.nih.gov/geo/query/acc.cgi?acc=GSE17237). I used the SOFT files formatted files to map the probes to the current gene annotation and reformat it in an expression matrix. ( available here: GSE17237.txt)
For this example, the expression values represent accordign to the SOFT file:
Data were analyzed using the limma package and the R statistical data analysis program (R 2.7.1). Due to some spread in M-values data was scale normalized between arrays at each timepoint. Values in matrix table are given as log2 ratios (test/reference)
When I run clust using the normalisation option -n 6, or -n 0, I get the following error: /==================================================================\ | Clust | | (Optimised consensus clustering of multiple heterogenous datasets) | | Python package version 1.8.12 (2018) Basel Abu-Jamous | +---------------------------------------------------------------------------+ | Analysis started at: Thursday 07 February 2019 (17:38:45) | | 1. Reading dataset(s) | | 2. Data pre-processing | Traceback (most recent call last): File "/software/shared/apps/x86_64/clust/1.8.12/bin/clust", line 10, in
sys.exit(main())
File "/shared/clssoft/apps/x86_64/clust/1.8.12/lib/python2.7/site-packages/clust/main.py", line 98, in main
args.cs, args.np, args.optimisation, args.q3s, args.basemethods, args.deterministic)
File "/shared/clssoft/apps/x86_64/clust/1.8.12/lib/python2.7/site-packages/clust/clustpipeline.py", line 97, in clustpipeline
OGsIncludedIfAtLeastInDatasets=OGsIncludedIfAtLeastInDatasets)
File "/shared/clssoft/apps/x86_64/clust/1.8.12/lib/python2.7/site-packages/clust/scripts/preprocess_data.py", line 465, in calculateGDMandUpdateDatasets
Xnew[l][ogi] = np.log2(np.sum(np.power(2.0, Xloc[l][np.in1d(OGsDatasets[l], og)]), axis=0))
AttributeError: 'float' object has no attribute 'log2'
Could it be, there is an problem when the input data is already log-transformed?
Thanks in advance, Emmelien