Chicago / food-inspections-evaluation

This repository contains the code to generate predictions of critical violations at food establishments in Chicago. It also contains the results of an evaluation of the effectiveness of those predictions.
http://chicago.github.io/food-inspections-evaluation/
Other
410 stars 130 forks source link

Add a couple packages to the initial setup script. #72

Closed orborde closed 9 years ago

orborde commented 9 years ago

Prior to this change, glmnet (and possibly other packages) wasn't installed until 30_glmnet_model.R actually got run, which is mildly annoying.

Tested by nuking my local R package directory and rerunning the pipeline from startup through 30_glmnet_model.R.

(Trivial pull request because this is the first time I've made one on Github, and because your contributions guide says I have to sign some kind of agreement.)

orborde commented 9 years ago

Clicked some buttons and got the agreement signed. LMK if anything goes haywire.

geneorama commented 9 years ago

I think it's a nice idea to install glmnet up front, even though it's against the philosophy

The idea behind geneorama::loadinstall_libraries is that libraries are loaded as needed and installed if missing.

So, if you wanted to create a new script that uses randomForest you wouldn't need to worry about updating the installation script. Other people would be able to run your alternative "30" script so long as you use geneorama::loadinstall_libraries libraries to load randomForest in your alternative model script.

I think this approach is convenient and pretty non-invasive for most users, since it only installs missing packages.

I acknowledge that this approach is limited because

  1. There is no ability to check / control package versions
  2. You might not want the package in your main library, and it doesn't have a library argument
  3. The function's name is preposterous (I named it, and I hate the name)

I think it would be a lot friendlier to use a project library to install any packages that are missing from the main library, but that's complicated if you're not in a project directory... I've never gotten around to implementing a solution because managing R's libraries is a little annoying / complicated, and I'm still not sure of the best route.

orborde commented 9 years ago

Good point about keeping the dependencies local to the scripts that actually make use of them; having to go update a global setup script for a new method would indeed be annoying and error-prone.

On the other hand, the reasons I can think of to have a single unified setup script are:

  1. You can run it once when you're connected to internet, and no longer need a network connection after you've done so. But this has never actually been a problem for me personally, so I'm not inclined to weight that concern highly.
  2. You can start the setup script, walk away for some coffee, and come back to a setup that doesn't require you to do any more waiting to use.

Neither of the above are super-compelling, though. Fundamentally, a single setup script reduces the friction involved in getting started, but maybe not by enough to be worth the maintenance headache.

There is no ability to check / control package versions

I acknowledge that this could theoretically be an issue, but it's complicated enough to deal with that I don't think it's worth bothering with until it actually causes a problem for someone.

You might not want the package in your main library, and it doesn't have a library argument

...

I think it would be a lot friendlier to use a project library to install any packages that are missing from the main library, but that's complicated if you're not in a project directory.

I'm not very familiar with the R conventions; having said that, this doesn't seem like a big deal to me. I'm perfectly happy to have a single, global R library in my homedir. From a "minimal setup friction" perspective, I do have to note that I had to create and configure a directory for that R library.

At any rate, I don't have a strong opinion either way at this point whether to close this PR or accept it, so go ahead and do whichever you think is best, and I'll merge accordingly.

geneorama commented 9 years ago

I completely agree that it's mildly annoying to have to install glmnet once you run the 30 script, but I'd rather stick to the pattern of only installing the installers in the startup script.