biolab / orange3

🍊 :bar_chart: :bulb: Orange: Interactive data analysis
https://orangedatamining.com
Other
4.87k stars 1.02k forks source link

Idea: show model from Curve Fit or Linear Regression in Scatter Plot together with point cloud #6403

Open wvdvegte opened 1 year ago

wvdvegte commented 1 year ago

What's your use case? Scatter plot can already show a regression line, but it would be nice if a resulting model from Curve Fit (or LR with different parameter settings) could be shown alongside the data

Are there any alternative solutions? Not that I'm aware of

janezd commented 1 year ago

If I understood your suggestion correctly, you can already use the Predictions widget to add a column with model's predictions and then plot this column vs. actual values in the scatterplot.

This will only show the points, but the curve cannot be shown in general anyway. We could connect points, in which case we'd just have a useless jagged curve, or we could smooth it, which would be a model on top of a model.

wvdvegte commented 1 year ago

I am aware of the possibility to show predicted vs. actuals - or a residuals plot just to show the error - but it puts emphasis on the accuracy of the predictions without showing the resulting curve.

Let's say this is a scatter plot of my target as a function of my feature: image Using a curve fit of the form target = (p1 + p2 / (feature + p3)) , I could visualize the resulting curve like this: image

Now I would already be happy if I could visualize second image on top of the first, each with a different colour - it doesn't have to be a continuous line, I see your point of not wanting to put a model on top of a model. However, unfortunately, Scatter Plot can show only one value on the y axis at a time. So perhaps I should rephrase my idea as "allow two variables as y values in Scatter Plot at the same time, using different colours (or other style characteristics)". Actually I can see more useful applications for that than what I brought up here, namely, visualizing the values on the predicted curves together with the actual values.

Edit: to explain why it is useful to display the predictions on top of the actual values, especially in the case of Curve Fit: From the parameterized function description, in this case target = (p1 + p2 / (feature + p3)) , it is not always clear what the resulting curve will look like. This is especially true for even more complex functions. Being able to compare the predicted curve with the actual values directly will make it easier to iteratively tweak the function description to follow the actual values more closely.

janezd commented 1 year ago

We discussed this.

  1. Scatter plot in principle shows two independent variables. Even showing regression line essentially does not belong to scatter plot (this is why we have option to treat the variables as independent).
  2. This seems to be useful mostly with Curve fit, not with general models.

That said, Scatter plot can already show linear regression line, but no other curves. So we thought that this functionality could be added to curve fit, that is, the widget could have a graph with a preview on the particular data. What about that?

wvdvegte commented 1 year ago

Crossed my mind, too. Good idea!

borondics commented 1 year ago

In general it would be great to see confidence intervals along with a fitted model somewhere. To me it seems more appropriate to have this in Curve Fit but I would be equally happy to keep the Linear Model in Scatter Plot and add the "confidence intervals" as @markotoplak suggested, with boosting.

wvdvegte commented 1 year ago

Just discovered there is a workaround to get what I was looking for:

Predictions vs  Actuals

Of course this doesn't add confidence intervals, but in a similar way, we might concatenate with other variables "prediction + confidence interval/2" and "prediction - confidence interval/2"

Another, similar useful visualization would be to show the regression line from Logistic Regression in Scatter Plot after binary classification based on two numerical variables. If I've understood it correctly, in that case, the regression line is the best possible line to separate the two classes, and it would be nice to be able to visualize it. Perhaps more suitable for the Educational add-on, though.