Open ZanMervic opened 3 weeks ago
Code may assume that values of categorical variables are unique. The bug is thus in discretization. Adding np.unique
, as I suggested in a comment in #6876, resolves it.
I nevertheless made #6878 to prevent construction of variables with duplicated values, so any future bugs that result in duplicated values will be reported earlier, at the appropriate place.
What's wrong?
This issue is related to the issue with Discretization #6876. If the input to the Continuize widget has attributes with multiple values with the same "name"/"value" (see Issue #6876 for a better explanation), the One-hot encoding will create multiple attributes with the same name which results in an exception.
Workflow I used (an extension of the workflow from issue #6876):
Exception:
Screenshot of the raised exception and the two attributes with the same name:
Note
Because of this issue, a test was failing for the ScoringSheet widget. I have temporarily excluded the widget from the test, but it should be included again when the issue is resolved.
Test:
Orange.tests.test_classification.LearnerAccessibility.test_all_models_work_after_unpickling_pca
How can we reproduce the problem?
Zip of the workflow: continuize_bug.zip
To reproduce the problem, set the PCA components to 8 in the provided workflow.
What's your environment?