Closed orborde closed 9 years ago
Clicked some buttons and got the agreement signed. LMK if anything goes haywire.
I think it's a nice idea to install glmnet
up front, even though it's against the philosophy
The idea behind geneorama::loadinstall_libraries
is that libraries are loaded as needed and installed if missing.
So, if you wanted to create a new script that uses randomForest
you wouldn't need to worry about updating the installation script. Other people would be able to run your alternative "30" script so long as you use geneorama::loadinstall_libraries
libraries to load randomForest
in your alternative model script.
I think this approach is convenient and pretty non-invasive for most users, since it only installs missing packages.
I acknowledge that this approach is limited because
I think it would be a lot friendlier to use a project library to install any packages that are missing from the main library, but that's complicated if you're not in a project directory... I've never gotten around to implementing a solution because managing R's libraries is a little annoying / complicated, and I'm still not sure of the best route.
Good point about keeping the dependencies local to the scripts that actually make use of them; having to go update a global setup script for a new method would indeed be annoying and error-prone.
On the other hand, the reasons I can think of to have a single unified setup script are:
Neither of the above are super-compelling, though. Fundamentally, a single setup script reduces the friction involved in getting started, but maybe not by enough to be worth the maintenance headache.
There is no ability to check / control package versions
I acknowledge that this could theoretically be an issue, but it's complicated enough to deal with that I don't think it's worth bothering with until it actually causes a problem for someone.
You might not want the package in your main library, and it doesn't have a library argument
...
I think it would be a lot friendlier to use a project library to install any packages that are missing from the main library, but that's complicated if you're not in a project directory.
I'm not very familiar with the R conventions; having said that, this doesn't seem like a big deal to me. I'm perfectly happy to have a single, global R library in my homedir. From a "minimal setup friction" perspective, I do have to note that I had to create and configure a directory for that R library.
At any rate, I don't have a strong opinion either way at this point whether to close this PR or accept it, so go ahead and do whichever you think is best, and I'll merge accordingly.
I completely agree that it's mildly annoying to have to install glmnet
once you run the 30 script, but I'd rather stick to the pattern of only installing the installers in the startup script.
Prior to this change, glmnet (and possibly other packages) wasn't installed until 30_glmnet_model.R actually got run, which is mildly annoying.
Tested by nuking my local R package directory and rerunning the pipeline from startup through 30_glmnet_model.R.
(Trivial pull request because this is the first time I've made one on Github, and because your contributions guide says I have to sign some kind of agreement.)