BaselAbujamous / clust

Automatic and optimised consensus clustering of one or more heterogeneous datasets
Other
161 stars 36 forks source link

ValueError: Usecols do not match columns #32

Closed jvelotta closed 5 years ago

jvelotta commented 5 years ago

Hi Basel,

I keep getting this error when running the script using python clust.py (I was not able to execute the program using the first two methods described). Any idea if this is because of a file formatting issue?

Thanks,

Jon

BaselAbujamous commented 5 years ago

Hi Jon,

Can you give some more details on how you run Clust and the error? Can you please copy and paste here the entire terminal output including the error? Are you running it over one or multiple datasets collectively?

I will make sure I provide the required assistant to make clust run for you.

All the best Basel

jvelotta commented 5 years ago

Hi Basel,

Thank you. I am running Clust on cpm normalized RNAseq data. I have a .txt data file (11,104 rows of gene names and 26 columns of individuals), and a replicates file. This is a single dataset.

Below is the code and the error message. The list of integers after the error message goes to 288730!

Thanks for your help.

Jonathans-MacBook-Pro:clust jonathanvelotta1$ python clust-1.8.12/clust.py gastroc_norm_counts.txt -r gastroc_replicates_file.txt

/===========================================================================\ | Clust | | (Optimised consensus clustering of multiple heterogenous datasets) | | Python package version 1.8.12 (2018) Basel Abu-Jamous | +---------------------------------------------------------------------------+ | Analysis started at: Friday 22 February 2019 (09:37:39) | | 1. Reading dataset(s) | Traceback (most recent call last): File "clust-1.8.12/clust.py", line 6, in main(args) File "/Users/jonathanvelotta1/Dropbox/RWork/peromyscus_ontogeny/clust/clust-1.8.12/clust/main.py", line 98, in main args.cs, args.np, args.optimisation, args.q3s, args.basemethods, args.deterministic) File "/Users/jonathanvelotta1/Dropbox/RWork/peromyscus_ontogeny/clust/clust-1.8.12/clust/clustpipeline.py", line 84, in clustpipeline returnSkipped=True) File "/Users/jonathanvelotta1/Dropbox/RWork/peromyscus_ontogeny/clust/clust-1.8.12/clust/scripts/io.py", line 46, in readDatasetsFromDirectory datafilesread = readDataFromFiles(datafileswithpath, delimiter, float, skiprows, skipcolumns, returnSkipped) File "/Users/jonathanvelotta1/Dropbox/RWork/peromyscus_ontogeny/clust/clust-1.8.12/clust/scripts/io.py", line 205, in readDataFromFiles usecols=range(skipcolumns, ncols), na_filter=data_na_filter, comments=comm) File "/Users/jonathanvelotta1/Dropbox/RWork/peromyscus_ontogeny/clust/clust-1.8.12/clust/scripts/io.py", line 240, in pdreadcsv_regexdelim delimiter='\t', dtype=dtype, header=-1, skiprows=skiprows, usecols=usecols, na_filter=na_filter, comment=comments).values File "/usr/local/lib/python2.7/site-packages/pandas/io/parsers.py", line 702, in parser_f return _read(filepath_or_buffer, kwds) File "/usr/local/lib/python2.7/site-packages/pandas/io/parsers.py", line 429, in _read parser = TextFileReader(filepath_or_buffer, kwds) File "/usr/local/lib/python2.7/site-packages/pandas/io/parsers.py", line 895, in init self._make_engine(self.engine) File "/usr/local/lib/python2.7/site-packages/pandas/io/parsers.py", line 1122, in _make_engine self._engine = CParserWrapper(self.f, self.options) File "/usr/local/lib/python2.7/site-packages/pandas/io/parsers.py", line 1902, in init _validate_usecols_names(usecols, self.names) File "/usr/local/lib/python2.7/site-packages/pandas/io/parsers.py", line 1237, in _validate_usecols_names "columns expected but not found: {missing}".format(missing=missing) ValueError: Usecols do not match columns, columns expected but not found: [27, 28, 29, 30, 31, 32, 33, 34, 35, 36, 37, 38, 39, 40, 41, 42, 43, 44, 45, 46, 47, 48, 49, 50, 51, 52, 53, 54, 55, 56, 57, 58, 59, 60, 61, 62, 63, 64, 65, 66, 67, 68, 69, 70, 71, 72, 73, 74, 75, 76, 77, 78, 79, 80, 81, 82, 83, 84, 85, 86, 87, 88, 89, 90, 91, 92, 93, 94, 95, 96

jvelotta commented 5 years ago

Update: I uninstalled and reinstalled clust and pandas, and that did not change the error message. Thanks again! J

BaselAbujamous commented 5 years ago

Hi and sorry for being late in replying.

I guess I have seen this error before with someone whose dataset does not use the correct newline character '\n' or '\r\n'; rather it only uses the carriage return character '\r'. I don't think that any modern proper operating system uses '\r' alone, as it technically does not define a new line.

In other words, to an operating system, your data file looks like a very very very long SINGLE LINE string!

The solution would be to replace every '\r' in your data file with '\r\n'. I am happy to do it for you if you like to email me the dataset confidentially @ basel.abu-jamous@sensynehealth.com.

Best wishes and please let me know if any further help is needed :)

Basel

jvelotta commented 5 years ago

Thanks Basel, that did the trick. My .tsv files were just one long line! Thanks again.