Closed erblast closed 7 years ago
Thanks again! I've left few comments on code (above) for corrections. Also, I built your functions and ran following, only to receive an error:
d <- datasets::mtcars
fit <- gamlss(d, vs ~ hp + wt, family = gamlss.dist::BI())
predict(fit, data = d)
#> Error in data.frame(data, source = namelist) :
#> arguments imply differing number of rows: 32, 33
Does this replicate for you?
sorry for the sloppy code from the last commit, i thought everything went fine on my computer but i am just getting used to package building functionality of Rstudio and i must have missed something. Especially the bug you found really annoyed me, because i could not figure out what was going on. This stackoverflow post
leads me to believe that gamlss:::predic.gamlss()
searches the local namespace for a variable with the same name as the data variable that was passed to gamlss()
to create the model object. I now attached the data to the model object and then pass it to gamlss:::predic.gamlss()
. And this fixes the issue for me.
I do not understand now why this travis CI build test is failing. Can you make something of it? Do you think attaching the data to the model will unnecessarily blow up memory usage? How does that relate to pipelearner where I can find the original data in every row for each cross validation sample. Does R recongnise that it is storing the same object and always point to the same location in memory?
No worries. Problems like these are exactly one of the reasons why I created twidlr! For other examples, see this gist I started as a resource for twidlr motivations (code here to be run without twidlr). I've just added this gamlss issue (after working out what is going on).
The problem is that gamlss:::predict.gamlss
is calling eval(Call$data)
when newdata
is used without data
. This searches for the original data in the global environment which, as far as I'm concerned, is not ideal.
I played with few ideas without digging much deeper into gamlss:::predict.gamlss
, but think that your solution of attaching the data is best for the moment. It may be a necessary evil for reliable results!
I have some ideas for improvement, however. First, to save some space, rather than attaching the entire original data frame, just use the variables included in the model as follows:
original_data <- data[all.vars(formula)]
Then attach this as an attribute rather than a new object in the list as follows:
attr(object, "original_data") <- original_data
Then, in predict, recover this original data via:
original_data <- attr(object, "original_data")
and feed this into the gamlss:::predict
function.
For example of using attributes, see here.
See how you go implementing this, but don't worry if it's not exact. I'm happy to clean any details after your next push.
Also, please add some notes in the documentation about what is going so users don't confuse data
(for new data), with data
in the original predict function.
Sorry, forgot to answer some other questions:
:::
. See #16 for my thoughts on this.$data
and points to this with resample objects. If you have an example you're thinking about, please email me and I'll take a look.Thanks for your quick reply, I was not aware of the issues from other packages that you mention in your gist at all. Its not really intuitive for me that a built model should depend on the original data at all, especially for regression models.
Also thanks for posting the threads about the :::
issue, this also cleared up some things for me. I will implement the changes you suggested.
I had to bring the original_dataframe and the new_dataframe to the same dimensions in order for predict.gamlss()
to work.
I am not sure what you want me to add as documentation. If you accidently pass newdata to twidlr::predict.gamlss()
it will throw an error. Sould this be mentioned in the documentation (the help page?) or as comments in the actual code?
done
Merged! Congrats and thanks again for another nice contribution :)
I cleaned the code in 3fc3d56d4a2b173c4a57008d51a02c357e09fce8 , which you can take a look at if you're planning to add anything else.
added gamlss