Chicago / west-nile-virus-predictions

Algorithm to predict repeated positive results for West Nile Virus for mosquitoes captured in traps across Chicago.
MIT License
14 stars 1 forks source link

Add documentation files to repo #34

Closed tomschenkjr closed 7 years ago

tomschenkjr commented 7 years ago
geneorama commented 7 years ago

I'm testing the code that I just committed in #22, I can't run the 10 step because I can't install rgdal or ROracle within the script.

For now I just copied over the one table that I get from Oracle from my hard drive, this isn't something I was worried about testing anyway. I do want to make sure that nothing else errors out because the commit to master was missing something.

tomschenkjr commented 7 years ago

Does the dev server need it installed on it? Not part of this milestone, but whether we should track it and make sure it gets done.

geneorama commented 7 years ago

No, I was just checking that the code run in a fresh environment. Maybe I should have just re-cloned in production, but I wanted to test that I could actually run the code as a "new user" since we're going to make this public. If there are special circumstances on this side that make it impossible for someone else to run, I want to at least mention it in the readme.

tomschenkjr commented 7 years ago

Ah, ok, got it.

geneorama commented 7 years ago

Right now I'm still waiting for ggplot2 to install (!)

tomschenkjr commented 7 years ago

It does like to install a ton of dependencies.

geneorama commented 7 years ago

Then caret installed... which takes a minute too. Now it's on the NOAA step, which is working! But I need to mention that the user needs to get a token and put the token in the untracked/ folder

geneorama commented 7 years ago

I'm not sure what to do about this part... I use the Oracle trap data to fix a couple of locations that moved, and to get coordinates for several traps that are technically outside of the city (because they're on the perimeter). Without the Oracle data the code won't execute, so people won't be able to come along and simply run the code. I can imagine workarounds, but they're not implemented.

tomschenkjr commented 7 years ago

My initial thought is it’s fine for now.

geneorama commented 7 years ago

If you don't mind, I'm going to add the static ward map to the repo. I get another error on the feature creation without it. I only use it for the map demo, but I don't really want to take it out of the features and create a separate place to calculate it later (and I really don't want to toss the map demo).

Also, I switched to the city geocoder, but then went back after I realized that I still have that issue with the out of bounds traps.

The file is 764 kilobytes so it's not increasing the footprint by too much.

geneorama commented 7 years ago

While trying to replicate, I ran into another problem that hasn't happened before; sometimes the weather is incomplete for the most recent history. This causes NA values. When I normalize the weather data, it makes the whole column NA because I don't ignore NA values. I hadn't encountered this during development because I was always working with observations that were at least a week old.

geneorama commented 7 years ago

I have the test deployment working on the development server.

In an effort to supplement the NA's in recent NOAA daily summaries I first tried to use WindyGrid / Mongo, but I ran into issues. The biggest problem was that I couldn't get a precipitation measure that looked like the NOAA data. The other issue was that I couldn't find weather by O'Hare; 60666 didn't turn up any data, and it didn't turn up anything using the drawn map extent in the actual WindyGrid application. I mention the O'Hare issue in case you want me to open an issue there.

link to mongo downloader: get_mongo_weather.R

I ended up using NOAA's hourly records, which match fairly closely to the daily summaries.

Comparison of aggregated hourly max temp (blue) to daily max temp (orange) image

Comparison of aggregated hourly max temp to daily max temp image

Sometimes the precipitation was off, but the cumsum makes sense, so the trailing averages should be ok (hourly aggregated is again blue) image image

The average wind speed also checks out image image

geneorama commented 7 years ago

@tomschenkjr I'm trying to add WNV_model to https://www.clahub.com/agreements/new, but it's not seeing this repo.
According to clahub I need to be admin, and my role at this organization needs to be publicized. I think both of these are true, right?

tomschenkjr commented 7 years ago

So, we use the “cityofchicago” GitHub account to manage the CLAHub because it’ll make it a little easier to manage in the long-run. I can do this in the meantime.

But, right now, it won’t be able to see it (since it’s a private repo). Also, we need to first determine the name of the repo that we’ll use.

After we determine the name and also make it public, I can turn on the CLA.

geneorama commented 7 years ago

Initial drafts of each file have been created, and a license has been added. @tomschenkjr can you please add this to clahub though?

geneorama commented 7 years ago

Need to document the libPath issue in the readme

tomschenkjr commented 7 years ago

@geneorama - the CLA is now live

geneorama commented 7 years ago

Added quite a bit of documentation to the code, and cleaned up the model code as well. Pushed changes to a dev branch, because I don't want to break master. I want to check that the right predictions are flowing though .