biolab / orange3

🍊 :bar_chart: :bulb: Orange: Interactive data analysis
https://orangedatamining.com
Other
4.89k stars 1.02k forks source link

OWPredictions: support two sparse inputs. #3987

Closed alexandre-j-silva closed 5 years ago

alexandre-j-silva commented 5 years ago

Is your feature request related to a problem? Please describe. I created a model to cluster items, but when imputing new data is presenting an error. TypeError: ufunc 'isnan' not supported for input types, and inputs could not be safely coerced to any supported types according to casting rule '' safe ''

Describe the solution you would like Let the predictive model work

Describe the alternatives you considered None

Additional Context I am a beginner and I am looking to learn a little more of this tool. I used google translation, so I apologize for any errors.


Traceback (most recent call last): File "C:\Users\55119\AppData\Local\Orange\lib\site-packages\Orange\canvas\scheme\widgetsscheme.py", line 1082, in process_signals_for_widget widget.handleNewSignals() File "C:\Users\55119\AppData\Local\Orange\lib\site-packages\Orange\widgets\evaluate\owpredictions.py", line 262, in handleNewSignals self._call_predictors() File "C:\Users\55119\AppData\Local\Orange\lib\site-packages\Orange\widgets\evaluate\owpredictions.py", line 279, in _call_predictors pred, prob = self.predict(slot.predictor, self.data) File "C:\Users\55119\AppData\Local\Orange\lib\site-packages\Orange\widgets\evaluate\owpredictions.py", line 618, in predict return cls.predict_discrete(predictor, data) File "C:\Users\55119\AppData\Local\Orange\lib\site-packages\Orange\widgets\evaluate\owpredictions.py", line 625, in predict_discrete return predictor(data, Model.ValueProbs) File "C:\Users\55119\AppData\Local\Orange\lib\site-packages\Orange\classification\base_classification.py", line 22, in call prediction = super().call(data, ret=ret) File "C:\Users\55119\AppData\Local\Orange\lib\site-packages\Orange\base.py", line 236, in call and not np.isnan(data.X).all(): TypeError: ufunc 'isnan' not supported for the input types, and the inputs could not be safely coerced to any supported types according to the casting rule ''safe''

OrderedDict([('data', [[24x1=1, 95=1, ex=1, excess=1, lev=1, ...] {- PNEU 24X1.95 EXCESS EX PR LEV}, [1=1, 26x1=1, 2x2=1, manga=1, pneu=1, ...] {- PNEU 26X1.1/2X2 MANGA TURBO}, [1=1, 26x1=1, 2x2=1, 91=1, k=1, ...] {- PNEU 26X1.1/2X2 K 91}, [26x1=1, 95=1, levorim=1, pneu=1] {LEVORIM - PNEU 26X1.95}, [26x1=1, 816=1, 95=1, k=1, pneu=1, ...] {- PNEU 26X1.95 K 816}, ... ]), ('fix_dim', <function Model.call..fix_dim at 0x00000146A8394F28>), ('one_d', False), ('ret', 2), ('self', LogisticRegressionClassifier(skl_model=LogisticRegression(C=1, class_weight=None, dual=False, fit_intercept=True, intercept_scaling=1.0, l1_ratio=None, max_iter=100, multi_class='ovr', n_jobs=1, penalty='l2', random_state=None, solver='liblinear', tol=0.0001, verbose=0, warm_start=False)) # params={'penalty': 'l2', 'dual': False, 'tol': 0.0001, 'C': 1, 'fit_intercept': True, 'intercept_scaling': 1.0, 'class_weight': None, 'random_state': None, 'solver': 'liblinear', 'max_iter': 100, 'multi_class': 'ovr', 'verbose': 0, 'n_jobs': 1})])

ajdapretnar commented 5 years ago

Could you please attach a screenshot of the workflow and perhaps some canonical data set with which it is possible to reproduce the issue (iris, housing, something from single cell data sets?).

ajdapretnar commented 5 years ago

Also please report your version of Orange.

alexandre-j-silva commented 5 years ago

Version: 3.22.0

screen

pneu.xlsx (CORPUS) Tinta_S.xlsx (CORPUS 1)

ajdapretnar commented 5 years ago

Nice find! This can be reproduced with any example where BoW is constructed for Corpus 1 and Corpus 2 separately.

On another note, this is not the correct (optimal) way of constructing a predictive modeling workflow for text. It should be like this:

Screen Shot 2019-08-28 at 14 26 49

Orange will automatically apply Preprocess Text and Bag of Words to the second data set. I tried this with your data and it works (but of course in the prediction data you only have one class, so there's an error for that).

On another note, Corpus should also enable setting the target variable in the widget itself. I will write an issue for this.

ajdapretnar commented 5 years ago

To reproduce for the others. This fails:

Screen Shot 2019-08-28 at 14 30 36

Use any dataset from Corpus.

alexandre-j-silva commented 5 years ago

Thank you for the tips. As I understand it, is there a mistake or am I misused?

ajdapretnar commented 5 years ago

So the crash should not happen, that is a bug. But your workflow should be like this:

Screen Shot 2019-08-28 at 14 26 49

It is a better way of predicting on new data.

alexandre-j-silva commented 5 years ago

OK thank you