Closed LouisSirugue closed 2 years ago
seems to work - one drawback of that solution is that you need to do some sophisticated rounding to make sure your solution lies on the grid of the slider:
a_solution <- round(2 * summary(lm(y ~ x))$coefficients[1, 1]) / 2
also, the values are quite far from the what you set in the DGP. like you have
a = 2,b = 1.2
but 2.5, 0.7
in the solution. I don't care about that tbh, as long as it always works. did you try to see where the limits of this code are wrt to slider values, and digits rounded in the above soluion? other than that, great work!
I was not sure whether you chose the step of the sliders and the parameters of the DGP arbitrarily or not, so I did not modify the DGP and I rounded the solution that way to adapt to the step of the sliders in case you chose these specific values on purpose.
The limit is that as long as you have to round you can get a sample such that the line whose rounded parameters are the closest from the OLS parameters is not the line with the lowest SSE among the finite set of lines with rounded parameters.
For instance, with a slider step of 0.25 and the following random sample:
set.seed(19)
x <- rnorm(n = 20, mean = 2, sd = 4)
y <- -2+ 1.5 * x + rnorm(n = 20, mean = 0, sd = 1.5)
summary(lm(y~x))$coefficients
Estimate Std. Error t value Pr(>|t|)
(Intercept) -1.118611 0.5433193 -2.058847 5.427976e-02
x 1.172680 0.1527779 7.675716 4.395579e-07
The closest slider inputs to ^b0 and ^b1 would -1 and 1.25:
sum((-1 + 1.25*x - y)^2)
[1] 64.41663
But conditionally on setting b1 to 1.25 instead of ^b1 = 1.172680, -1.25 for b0 gives lower SSE than -1 even though -1 is closer from ^b0:
sum((-1.25 + 1.25*x - y)^2)
[1] 62.70134
Right now the app is programmed such that the correct answer is the one whose parameters are the closest from the OLS parameters, which can lead to solution considered as correct even though it is not the one with the lowest SSE among the available combinations of slider inputs. We can also make it such that the correct solution is not set of parameters that are the closest from the OLS parameters, but those that minimize the SSE among the available combinations of slider inputs. The solution considered as correct won't always be the line that best fits the data visually, but according to me it would be the best option. I can do that if you want, right now I just selected seeds that do not generate rounding issues.
i understand - that's exactly my point. relying on the random seed could be a risky strategy. anyway, i would not worry further about this now, you invested enough time already.
good to merge for me unless you want to further modify this!
Ok let's stick to that then, and maybe later on I'll program something more sophisticated. Thanks!
Instead of using the parameters of the DGP to turn the squared errors green, use the parameters of the sample fit.