Open sinaneza opened 7 years ago
If you want to have the fitted model that you see in ggplot2, you will have to fit it yourself, rather than rely on ggplot2 to do it.
As for "fit then predict", a small example showing your problem is needed.
Hi, Sina. See R for Data Science by Hadley Wickham and Garrett Grolemund
@sjackman Thank you very much Shawn I need to read through the links. In the case of having troubles with the instructions, I will let you know by a comment on this issue.
Dear @jennybc
To my point of view, your suggestion is just applicable when you want to fit the model to the interval in which your data frame is spanned (interpolation).
However, in the case that I want to find an information for a value outside the interval in which data is spanned, predict()
fails.
The code is as follow:
rm(list = ls())
library("tidyverse")
mtcars
as_tibble(mtcars)
model <- loess(wt ~ hp, data=mtcars, normalize = FALSE)
xrange <- range(mtcars$hp)
xseq <- seq(from=xrange[1], to=xrange[2], length=100)
pred <- predict(model, newdata = data.frame(hp = xseq), se=TRUE)
y <- pred$fit
ci <- pred$se.fit * qt(0.95 / 2 + .5, pred$df)
ymin = y - ci
ymax = y + ci
loess.DF <- data.frame(x = xseq, y, ymin, ymax, se = pred$se.fit)
ggplot(mtcars, aes(x=hp, y=wt)) +
geom_point() +
geom_line(aes(x = x, y= y), stat = "identity", data = select(loess.DF, c(x, y)))
# geom_smooth(aes_auto(loess.DF), data=loess.DF, stat="identity")
ggplot(mtcars, aes(x=hp, y=wt)) +
geom_point() +
geom_smooth()
xseq2 <- seq(from = xrange[2], to = xrange[2]+300, length = 80)
pred2 <- predict(model, newdata2 = data.frame(hp = xseq2), se = TRUE)
y2 <- pred2$fit
y2
loess.DF2 <- data.frame(x = xseq2, y = y2)
As far as I understand it, the same code should work for both interpolation and extrapolation, in that you'll get a prediction. Whether the prediction makes any sense in the real world depends on your model.
Could you please copy-and-paste the error message that you're seeing? Use https://github.com/tidyverse/reprex
Dear @sjackman
The error is:
Error in data.frame(x = xseq2, y = y2) : arguments imply differing number of rows: 80, 32
Which means that it had not computed wt
for the the xseq2
since their size are totally different. Infact, model$fit
gives us exactly the same vector as y2
in the previously given code
@sjackman
In fact, in the case that it had computed wt
correctly for xseq2
, it should have the same length as xseq2
.
Try using modelr::add_predictions
Dear @sjackman
To my understanding so far, the issue was not related to predict()
and add_predict()
. The issue was the model selection, as I selected the "polynomial function" for my model it was able to do extrapolation. However, changing the model to "B-spline" and "local regression", extrapolation gives 'N/A' as the result.
BTW, I was wondering to know whether "local regression" and "B-Spline" methods are just applicable to interpolation or extrapolation is also applicable to them
The applied code to find the mentioned issue is as follow:
rm(list = ls())
library("tidyverse")
mtcars
as_tibble(mtcars)
model <- loess(wt ~ hp, data=mtcars)
model2 <- lm(wt ~ splines::bs(hp, 3), data = mtcars)
model3 <- lm(wt ~ poly(hp, 3), data = mtcars)
xrange <- range(mtcars$hp)
xseq <- seq(from=xrange[1], to=xrange[2], length=100)
pred <- predict(model, newdata = data.frame(hp = xseq), se=TRUE)
y <- pred$fit
ci <- pred$se.fit * qt(0.95 / 2 + .5, pred$df)
ymin = y - ci
ymax = y + ci
loess.DF <- data.frame(x = xseq, y, ymin, ymax, se = pred$se.fit)
ggplot(mtcars, aes(x=hp, y=wt)) +
geom_point() +
geom_line(aes(x = x, y= y), stat = "identity", data = select(loess.DF, c(x, y)))
# geom_smooth(aes_auto(loess.DF), data=loess.DF, stat="identity")
ggplot(mtcars, aes(x=hp, y=wt)) +
geom_point() +
geom_smooth()
xseq2 <- seq(from = xrange[2], to = xrange[2]+300, length = 80)
pred2 <- predict(model, newdata2 = data.frame(hp = xseq2), se = TRUE)
y2 <- pred2$fit
y2
loess.DF2 <- data.frame(x = xseq2, y = y2)
XSeq_DF <- data_frame(hp = xseq)
XSeq2_DF <- data_frame(hp = xseq2)
library(modelr)
add_predictions(XSeq_DF, model)
add_predictions(XSeq2_DF, model)
add_predictions(XSeq_DF, model2)
add_predictions(XSeq2_DF, model2)
A <- add_predictions(XSeq_DF, model3)
add_predictions(XSeq2_DF, model3)
predict(model3, XSeq2_DF)
X_DF <- data_frame(hp = seq(from = xrange[1], to = xrange[2]+300, length = 200)) %>%
add_predictions(model3)
filter(X_DF, hp>=335)
ggplot(data = mtcars, aes(x = hp, y = wt)) +
geom_point() +
geom_smooth(method = "lm", formula = y ~ poly(x, 3)) +
geom_line(data = A,
aes(x = hp, y = pred),
colour = "red") +
geom_line(data = filter(X_DF, hp < 400),
aes(x = hp, y = pred),
colour = "green")
The issue was the model selection, as I selected the "polynomial function" for my model it was able to do extrapolation. However, changing the model to "B-spline" and "local regression", extrapolation gives 'N/A' as the result.
Ah, I see. Yes it seems reasonable that some models may be incapable of extrapolation. I'm afraid that I'm out of my comfort zone here though. Sorry. Did polynomial function
work for you, then?
BTW, I was wondering to know whether "local regression" and "B-Spline" methods are just applicable to interpolation or extrapolation is also applicable to them
I'm afraid that I don't know the answer. :pensive:
When the fit was made using surface = "interpolate" (the default), predict.loess will not extrapolate – so points outside an axis-aligned hypercube enclosing the original data will have missing (NA) predictions and standard errors.
See https://stat.ethz.ch/R-manual/R-devel/library/stats/html/predict.loess.html
Dear all
I have a major trouble in extrapolating by a model (
loess()
,lm()
, ...). I know that interpolation is possible usingpredict()
. However, I am really experiencing a trouble with extrapolation.Furthermore, I don't know how I can extract the function by which the curve in
geom_smooth()
is drawn.I would appreciate if anyone could help me with these issues.
@jennybc @daattali @sjackman @oganm