Open jpinus opened 1 year ago
I had the same issue with a new install of concoct via conda. I think the problem comes from the version of sklearn
, which is too advanced in a new install.
For the conda install that did not work, I got the following output:
python
Python 3.10.8 | packaged by conda-forge | (main, Nov 22 2022, 08:23:14) [GCC 10.4.0] on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> import sklearn
>>> sklearn.__version__
'1.2.1'
>>> quit()
I instead used a singularity container that had a working version of concoct:
singularity shell docker://quay.io/biocontainers/concoct:1.1.0--py27h88e4a8a_0
python
Python 2.7.15 | packaged by conda-forge | (default, Jul 2 2019, 00:39:44)
[GCC 7.3.0] on linux2
Type "help", "copyright", "credits" or "license" for more information.
>>> import sklearn
>>> sklearn.__version__
'0.20.3'
>>> quit()
I suggest adding an upper bound for the sklearn
package (or specify exactly which version of sklearn
you install).
The problem seems to be this line here in an older version of sklearn
https://github.com/scikit-learn/scikit-learn/blob/ffc0f66676b4835eb1bdd3f3ecab025e9c1be9fe/sklearn/utils/validation.py#L1859
It works with these packages installed
name: concoct
channels:
- conda-forge
- bioconda
dependencies:
- scikit-learn=1.1.0
- concoct=1.1.0
I installed concoct via conda and followed the basic usage (https://concoct.readthedocs.io/en/latest/usage.html). Everything is going fine until I run concoct:
command: concoct --composition_file contigs_10K.fa --coverage_file coverage_table.tsv -b concoct_output/ --thread 12
output: Up and running. Check [...]/concoct_output/log.txt for progress Traceback (most recent call last): File "[...]/.conda/envs/concoct_env/bin/concoct", line 90, in
results = main(args)
File "[...]/.conda/envs/concoct_env/bin/concoct", line 37, in main
transform_filter, pca = perform_pca(
File "[...]/.conda/envs/concoct_env/lib/python3.10/site-packages/concoct/transform.py", line 5, in perform_pca
pca_object = PCA(n_components=nc, random_state=seed).fit(d)
File "[...]/.conda/envs/concoct_env/lib/python3.10/site-packages/sklearn/decomposition/_pca.py", line 435, in fit
self._fit(X)
File "[...]/.conda/envs/concoct_env/lib/python3.10/site-packages/sklearn/decomposition/_pca.py", line 485, in _fit
X = self._validate_data(
File "[...]/.conda/envs/concoct_env/lib/python3.10/site-packages/sklearn/base.py", line 529, in _validate_data
self._check_feature_names(X, reset=reset)
File "[...]/.conda/envs/concoct_env/lib/python3.10/site-packages/sklearn/base.py", line 396, in _check_feature_names
feature_names_in = _get_feature_names(X)
File "[...]/.conda/envs/concoct_env/lib/python3.10/site-packages/sklearn/utils/validation.py", line 1903, in _get_feature_names
raise TypeError(
TypeError: Feature names are only supported if all input features have string names, but your input has ['int', 'str'] as feature name / column name types. If you want feature names to be stored and validated, you must convert them all to strings, by using X.columns = X.columns.astype(str) for example. Otherwise you can remove feature / column names from your input data, or convert them all to a non-string data type.