Prediction of direct and loaded nn models is different

WoifeG commented 2 years ago

What's wrong? We got a 55 dimensional dataset and are choosing 5 features to create a (neural network) model ignoring the remaining 50 dimensions. When testing the model with via the prediction widget, we get vastly different results depending on source of the model. First, the source is the neural network widget directly, and second we have saved the model and loaded it again. (see ows) In our opinion, we should not see any differences in the output of the prediction widget, but as you can see, we have a difference in the recall of ~0.15, which should not occur.

Interestingly enough, the results are correct and equal, if the remaining 50 features are set as meta-variables and not ignored as in the used select columns widget.

How can we reproduce the problem? Attached you will find the sample-ows and the used dataset. orange_model_bug.zip

Maybe a similar case of #5548

What's your environment?

Operating system: Windows 10 Enterprise x64
Orange version: 3.32
How you installed Orange: portable and via anaconda

markotoplak commented 2 years ago

Thanks for the report. In this case the output scores would need to be the same.

ajdapretnar commented 2 years ago

I tried to replicate this on iris and on your example and I get the same probabilities in both cases.

Clemens9 commented 2 years ago

Could you please provide your Workflow and your environment? That would be interesting for us... We tried to replicate it also with the heart disease dataset and were not able to reproduce the bug. I am also part of the group investigating this issue.

markotoplak commented 2 years ago

It might not be a bug of the model, it could also be a bug in data processing in Orange.

When models are saved their domain is saved as well. That domain includes feature transformations, which may then be differently applied, or (as a bug) even applied twice. Here I see only "Edit domain" as something that works with feature transformations. Could you explain what did you do there?

Also, can you try to make a smaller workflow demonstrating the issue?

Clemens9 commented 2 years ago

In "Edit Domain" the order of the values of the variable cluster was changed, which is only important for visualisation. The bug remains, when I remove the "Edit Domain" widget.

Clemens9 commented 2 years ago

orange_model_bug_small.zip

Here is a smaller version of the workflow. One aspect I noticed in particular was that the "Feature Constructor" widget is somehow responsible for the different results of the predictions.

markotoplak commented 2 years ago

Thanks! Oh, yes, I missed Feature constructor, that is certainly the one that introduce variable computation.

HannesLum commented 2 years ago

So, i did some digging and found the following behaviour in the orange code: When the model is directly passed to the Predictor he first does a Transformation of the Data, where most notably in Orange/base.py line 377 self.original_domain.attributes != data.domain.attributes computes to False (as it should, I think, as the two domains are supposed to be the same). Orange then does the data transformation sucessfully and continues with the calculation.

Now, if the same is done with the Loaded model, two things are different: First self.original_domain.attributes != data.domain.attributes computes to True, because although the variables are exactly the same, the objects are not, because they have been created anew when the Model was loaded. This doesn't change the behaviour of the program too much. The only consequence is, that the data transformation is split into two parts: First it is transformed to self.domain (without Normalizations etc. or in other words, the variables have no attribute compute_value=Normalizer(...)). Afterwards the transformation to self.original_domain is done as before. The problem here is the first Transformation. When a variable has been created via the feature constructor, the transformation in Orange/base.py line 381 results in nan-values for the whole column. This results in very wrong predictions, of course.

The bug now either lies with the way the model or rather constructed features are stored, or how constructed features are transformed. The second case is possible if the constructed feature is not transformed at all, as long as the corresponding attribute-objects are the same and thus omitted in the first case. If the transformation happens in both cases, then there has to be a problem with the storage of constructed variables.

markotoplak commented 2 years ago

Thanks! What you describe is standard Orange behavior :) but there seems to be a bug in the variables constructed by Feature Constructor. This widget was worked on quite a bit since the last release.

Also, we have been changing how equality between features is computed.

Could you verify if you get the same problem with the code from PR #6002?

HannesLum commented 2 years ago

Glad to help! :)

Regarding the verification i can't do it right now, as setting up the environment directly via setup.py seems to require Visual C++ which in turn needs admin-right that i don't have on my work-laptop. So I might have a better idea of how to do this on wednesday, but for now the bed calls.

HannesLum commented 2 years ago

Hello again, a colleague verified, that the Problem gets solved with PR #6002 - thank you for the solution!

markotoplak commented 2 years ago

Thank your for preparing such an easy test case.

I tried your workflow:

with 3.32.0 I can confirm the bug
with the current master branch this bug is already fixed, so next release will surely have a fix
with #6002 it still works well, but this particular issue was fixed sometime before it (since the release there has been a major refactoring of the Feature Constructor)

I am now closing this issue because it is already fixed in the master branch. Thanks for bringing this to our attention.

biolab / orange3

Prediction of direct and loaded nn models is different #6091