biolab / orange3

🍊 :bar_chart: :bulb: Orange: Interactive data analysis
https://orangedatamining.com
Other
4.78k stars 996 forks source link

Confidence intervals for regression lines #5722

Open kaimikael opened 2 years ago

kaimikael commented 2 years ago

What's your use case?

The Scatter Plot widget lets the user add a regression line to the scatter plot. However, there is no indication of the uncertainty of the displayed regression. I would like to be able to display x% confidence intervals.

What's your proposed solution?

A checkbox and number box to add confidence intervals for given percentages. (And the same for standard errors/deviations.)

Are there any alternative solutions?

No easy alternative comes to mind.

janezd commented 2 years ago

We have discussed this at some length at today's meeting.

The Scatter Plot widget is already quite heavy. We hesitate adding more unless it really belongs there.

The problem is -- it doesn't. Scatter Plot basically shows two - in principle - independent variables. The line through the group could also be vertical. Currently, the widget has a checkbox whether to treat the variables as independent or not, and computes the "regression" line accordingly. Default is to treat y as dependent, which is, imho, wrong, but it accomodates the user's (inappropriate) expectation.

Besides, it doesn't stop with confidence intervals. There's a bunch of other things one might want to show. But this would turn the Scatter Plot widget into Linear Regression widget.

Our decision was thus to not implement this in scatter plot. There is alsready a widget called Linear Regression which could have a plot with all bells and whistles one can imagine, and which could also output all residuals and whatever other stuff.

P.S. In the spirit of what the scatter plot is, I would much prefer removing the lines and showing contours of 2d Gaussian (hm?) distribution and principle components. This would be particularly nice when there are multiple (colored) groups.

ajdapretnar commented 2 years ago

Instead of closing this, I would keep it and convert it to feature request: Linear Regression visualization. Or something.

janezd commented 2 years ago

You're right. I opened new issue, #5733, though, and referred to this one.

markotoplak commented 2 years ago

Perhaps we could use bootstrap to show the uncertainty in the estimation of the regression lines. And because it is a visualization, it fits this widget perfectly. Like this:

bootstrap

This was an example in Ĺ trumbelj's bootstrap lecture.

janezd commented 2 years ago

I think the computation is not much of a problem here. The question is whether to add even more to this widget. Next month we add non-linear regression curve, followed in February by a new method of variable selection (VizRank is more appropriate for independent variables, while for dependent we should sort variables by correlation coefficient) ...

kaimikael commented 2 years ago

I wonder, could it be done so that there are specialised widgets that do one thing each and the output of these can be combined in one meta-visualiser?

markotoplak commented 2 years ago

We could add remove a feature to add one. I'd remove not-treating variables as independent (which does not fit into scatterplot) and add bootstrapped lines.

kaimikael commented 2 years ago

Thinking more about it, wouldn’t a Combine Plot widget be a good thing? That would allow e g combining a Line Plot with a Bar Plot, which is a not entirely uncommon combination:

image

janezd commented 2 years ago

Treat variables should be indented.

janezd commented 2 years ago

This should be solved within #5733, so I am closing this as separate issue.

markotoplak commented 1 year ago

@borondics just suggested he'd like something like the above bootstrapped version in a scatterplot.