fmohr / lcdb

10 stars 4 forks source link

[LCDB 2.0] KNNWorkflow ValueError when classes are not integers #18

Closed Deathn0t closed 9 months ago

Deathn0t commented 9 months ago

The following command triggers a ValueError:

lcdb test -id 6 -w lcdb.workflow.sklearn.KNNWorkflow -m -vs 42 -ts 42 -ws 42 --parameters '{"metric": "minkowski", "n_neighbors": 3, "pp@cat_encoder": "ordinal", "pp@decomposition": "ka_nystroem", "pp@featuregen": "none", "pp@featureselector": "none", "pp@scaler": "minmax", "weights": "uniform", "p": 5, "pp@kernel_pca_kernel": "linear", "pp@kernel_pca_n_components": 0.25, "pp@poly_degree": 2, "pp@selectp_percentile": 25, "pp@std_with_std": true}'

Output:

Traceback (most recent call last):
  File "/Users/romainegele/Documents/Research/LCDB/lcdb/publications/2023-neurips/lcdb/controller.py", line 158, in build_curves
    self.compute_metrics_for_workflow()
  File "/Users/romainegele/Documents/Research/LCDB/lcdb/publications/2023-neurips/lcdb/controller.py", line 265, in compute_metrics_for_workflow
    predictions, labels = self.get_predictions()
                          ^^^^^^^^^^^^^^^^^^^^^^
  File "/Users/romainegele/Documents/Research/LCDB/lcdb/publications/2023-neurips/lcdb/controller.py", line 283, in get_predictions
    keys[f"y_pred_{label_split}"] = self.workflow.predict(X_split)
                                    ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/Users/romainegele/Documents/Research/LCDB/lcdb/publications/2023-neurips/lcdb/workflow/_base_workflow.py", line 43, in predict
    y_pred = self._predict(*args, **kwargs)
             ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/Users/romainegele/Documents/Research/LCDB/lcdb/publications/2023-neurips/lcdb/workflow/sklearn/_knn.py", line 87, in _predict
    return self.learner.predict(X)
           ^^^^^^^^^^^^^^^^^^^^^^^
  File "/Users/romainegele/Documents/Argonne/deephyper-scikit-learn/sklearn/neighbors/_classification.py", line 258, in predict
    probabilities = self.predict_proba(X)
                    ^^^^^^^^^^^^^^^^^^^^^
  File "/Users/romainegele/Documents/Argonne/deephyper-scikit-learn/sklearn/neighbors/_classification.py", line 336, in predict_proba
    probabilities = ArgKminClassMode.compute(
                    ^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/Users/romainegele/Documents/Argonne/deephyper-scikit-learn/sklearn/metrics/_pairwise_distances_reduction/_dispatcher.py", line 579, in compute
    unique_Y_labels=np.array(unique_Y_labels, dtype=np.intp),
                    ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
ValueError: invalid literal for int() with base 10: 'A'
Deathn0t commented 9 months ago

The suggested fix https://github.com/fmohr/lcdb/commit/89eb65ec8bb782b4589e36cdeda18a0097dac72f transforms labels to integers when loading the dataset.