ValueError: could not convert string to float

nat2bee commented 3 years ago

I'm getting the following error when trying to run clust:

clust /home/nsa/HS_intra/clust_data -d 6 -m Orthogroups.tsv -r Replicates.txt -n Normalisation.txt

/===========================================================================\ | Clust | | (Optimised consensus clustering of multiple heterogenous datasets) | | Python package version 1.12.0 (2019) Basel Abu-Jamous | +---------------------------------------------------------------------------+ | Analysis started at: Tuesday 27 October 2020 (14:02:55) | | 1. Reading dataset(s) | Traceback (most recent call last): File "pandas/_libs/parsers.pyx", line 1141, in pandas._libs.parsers.TextReader._convert_tokens TypeError: Cannot cast array data from dtype('O') to dtype('float64') according to the rule 'safe'

During handling of the above exception, another exception occurred:

Traceback (most recent call last): File "/home/nsa/miniconda3/envs/clust/bin/clust", line 10, in sys.exit(main()) File "/home/nsa/miniconda3/envs/clust/lib/python3.8/site-packages/clust/main.py", line 101, in main clustpipeline.clustpipeline(args.datapath, args.m, args.r, args.n, args.o, args.K, args.t, File "/home/nsa/miniconda3/envs/clust/lib/python3.8/site-packages/clust/clustpipeline.py", line 86, in clustpipeline (X, replicates, Genes, datafiles) = io.readDatasetsFromDirectory(datapath, delimiter='\t| |, |; |,|;', skiprows=1, skipcolumns=1, File "/home/nsa/miniconda3/envs/clust/lib/python3.8/site-packages/clust/scripts/io.py", line 46, in readDatasetsFromDirectory datafilesread = readDataFromFiles(datafileswithpath, delimiter, float, skiprows, skipcolumns, returnSkipped) File "/home/nsa/miniconda3/envs/clust/lib/python3.8/site-packages/clust/scripts/io.py", line 204, in readDataFromFiles X[l] = pdreadcsv_regexdelim(datafiles[l], delimiter=delimiter, dtype=dtype, skiprows=skiprows, File "/home/nsa/miniconda3/envs/clust/lib/python3.8/site-packages/clust/scripts/io.py", line 239, in pdreadcsv_regexdelim result = pd.read_csv(StringIO('\n'.join(re.sub(delimiter, '\t', str(x)) for x in f)), File "/home/nsa/miniconda3/envs/clust/lib/python3.8/site-packages/pandas/io/parsers.py", line 686, in read_csv return _read(filepath_or_buffer, kwds) File "/home/nsa/miniconda3/envs/clust/lib/python3.8/site-packages/pandas/io/parsers.py", line 458, in _read data = parser.read(nrows) File "/home/nsa/miniconda3/envs/clust/lib/python3.8/site-packages/pandas/io/parsers.py", line 1196, in read ret = self._engine.read(nrows) File "/home/nsa/miniconda3/envs/clust/lib/python3.8/site-packages/pandas/io/parsers.py", line 2155, in read data = self._reader.read(nrows) File "pandas/_libs/parsers.pyx", line 847, in pandas._libs.parsers.TextReader.read File "pandas/_libs/parsers.pyx", line 862, in pandas._libs.parsers.TextReader._read_low_memory File "pandas/_libs/parsers.pyx", line 941, in pandas._libs.parsers.TextReader._read_rows File "pandas/_libs/parsers.pyx", line 1073, in pandas._libs.parsers.TextReader._convert_column_data File "pandas/_libs/parsers.pyx", line 1147, in pandas._libs.parsers.TextReader._convert_tokens ValueError: could not convert string to float: 'TRINITY_DN401_c0_g1_i10_685_0.701307.p4'

Does that mean that I can only use number as gene IDs? Because in Figure 8 of program documentation it shows a file using other characters. Are there a specific standard for gene name besides that it "should not include spaces, commas, or semicolons" ?

Thank you!

nat2bee commented 3 years ago

Hi, just an update. I've renamed all my genes but in the end I think this was not the problem. I believe the problem was that in the data folder it must contain only the count matrix to be used and nothing more, I had other files in the same folder and I guess the error was coming from it. Thus I'm closing it here.

Thanks!

BaselAbujamous commented 3 years ago

Hey! Very happy seeing this has been resolved! Please let me know if you needed any help in other things

BaselAbujamous / clust

ValueError: could not convert string to float #60