Closed JackDC-12 closed 4 years ago
You need to instantiate the model correctly.
Use this in your function:
def orange_learner_accuracy(x_train, x_test, y_train, y_test, learner):
table_train = get_table(x_train, y_train)
table_test = get_table(x_test, y_test)
# train the model with the learner
classifier = learner(table_train)
# use the model for prediction on test data
prediction = classifier(table_test)
ca = np.sum(table_test.Y == prediction)/y_test.shape[0]
return ca
I have renamed some variables for clarity.
You can also simplify your code a lot. There a helper function for transforming pandas DataFrames to Orange.data.Table: from Orange.data.pandas_compat import table_from_frame
.
You can also call the scores you need directly. Read more about this in the documentation: https://docs.biolab.si//3/data-mining-library/tutorial/classification.html#classification
Thank you for the quick reply! If I understood correctly, the catch is to call the classifier on table_test instead of table_test.X. After doing this, I received an error because the domain of table_train and table_test were not the same. Apparently, they have to be the exact same domain instance, it is not sufficient to instantiate 2 equal domains. After this fix, everything worked as expected. Thank you for the helper from pd to table, I was not aware of that! Last question: How discrete variables are translated for been handled by the log reg learner? Is there an automatic one-hot encoding?
Thanks a lot for your help! Giacomo
Describe the bug I use a single feature to classify a binary target variable using Orange.classification.LogisticRegressionLearner() on a python program. If I instantiate the feature as Continuos, everything is fine, if I instantiate it as Discrete, A valueError appears: ValueError: X has 1 features per sample; expecting 4 Note that 4 is the number of possible values of the Discrete Variable. In fact, If I put just 2 values, the error says expecting 2, and so on
To Reproduce I attached a zip with the python program (just a small test) and the csv file. test.zip
Orange version: 3.23
Expected behavior The error should not appear, and the result should be consistent with the one given by the GUI
Screenshots the error stack trace
Operating system: Windows 10
Additional context I guess that orange calls the sklearn library directly for the logistic regression. In this case, how are the discrete variables handled?