TypeError: Feature names

jpinus commented 1 year ago

I installed concoct via conda and followed the basic usage (https://concoct.readthedocs.io/en/latest/usage.html). Everything is going fine until I run concoct:

command: concoct --composition_file contigs_10K.fa --coverage_file coverage_table.tsv -b concoct_output/ --thread 12

output: Up and running. Check [...]/concoct_output/log.txt for progress Traceback (most recent call last): File "[...]/.conda/envs/concoct_env/bin/concoct", line 90, in results = main(args) File "[...]/.conda/envs/concoct_env/bin/concoct", line 37, in main transform_filter, pca = perform_pca( File "[...]/.conda/envs/concoct_env/lib/python3.10/site-packages/concoct/transform.py", line 5, in perform_pca pca_object = PCA(n_components=nc, random_state=seed).fit(d) File "[...]/.conda/envs/concoct_env/lib/python3.10/site-packages/sklearn/decomposition/_pca.py", line 435, in fit self._fit(X) File "[...]/.conda/envs/concoct_env/lib/python3.10/site-packages/sklearn/decomposition/_pca.py", line 485, in _fit X = self._validate_data( File "[...]/.conda/envs/concoct_env/lib/python3.10/site-packages/sklearn/base.py", line 529, in _validate_data self._check_feature_names(X, reset=reset) File "[...]/.conda/envs/concoct_env/lib/python3.10/site-packages/sklearn/base.py", line 396, in _check_feature_names feature_names_in = _get_feature_names(X) File "[...]/.conda/envs/concoct_env/lib/python3.10/site-packages/sklearn/utils/validation.py", line 1903, in _get_feature_names raise TypeError( TypeError: Feature names are only supported if all input features have string names, but your input has ['int', 'str'] as feature name / column name types. If you want feature names to be stored and validated, you must convert them all to strings, by using X.columns = X.columns.astype(str) for example. Otherwise you can remove feature / column names from your input data, or convert them all to a non-string data type.

jakob-wirbel commented 1 year ago

I had the same issue with a new install of concoct via conda. I think the problem comes from the version of sklearn, which is too advanced in a new install.

For the conda install that did not work, I got the following output:

python
Python 3.10.8 | packaged by conda-forge | (main, Nov 22 2022, 08:23:14) [GCC 10.4.0] on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> import sklearn
>>> sklearn.__version__
'1.2.1'
>>> quit()

I instead used a singularity container that had a working version of concoct:

singularity shell docker://quay.io/biocontainers/concoct:1.1.0--py27h88e4a8a_0
python
Python 2.7.15 | packaged by conda-forge | (default, Jul  2 2019, 00:39:44) 
[GCC 7.3.0] on linux2
Type "help", "copyright", "credits" or "license" for more information.
>>> import sklearn
>>> sklearn.__version__
'0.20.3'
>>> quit()

I suggest adding an upper bound for the sklearn package (or specify exactly which version of sklearn you install).

jakob-wirbel commented 1 year ago

The problem seems to be this line here in an older version of sklearn https://github.com/scikit-learn/scikit-learn/blob/ffc0f66676b4835eb1bdd3f3ecab025e9c1be9fe/sklearn/utils/validation.py#L1859

It works with these packages installed

name: concoct
channels:
  - conda-forge
  - bioconda
dependencies:
  - scikit-learn=1.1.0
  - concoct=1.1.0

BinPro / CONCOCT

TypeError: Feature names #323