Chicago / food-inspections-evaluation

This repository contains the code to generate predictions of critical violations at food establishments in Chicago. It also contains the results of an evaluation of the effectiveness of those predictions.
http://chicago.github.io/food-inspections-evaluation/
Other
411 stars 130 forks source link

Source of weather data? #97

Closed greensam closed 7 years ago

greensam commented 8 years ago

I am working on replicating / extending this analysis for a course project (running on updated data), but I can't find any indication of the source of weather data. Where is it from?

geneorama commented 7 years ago

@greensam Sorry about the delay in response. We're using internal data which was purchased from [darksky.net] (https://darksky.net/) through their API which was called forecast.io.

I don't have a code example for downloading directly from their API, because we do it on an automated basis and store it in a database. However, you could download it yourself from there.

In another project we've been using NOAA data, which is free if you register for a API key.

Here's a code example to get you started:

get_noaa_content <- function(base_url, LIMIT = 1000, param=NULL, token=NULL){
    result <- list()
    i <- 0
    while(length(result)==0 || length(result[[length(result)]]) >= LIMIT ){
        i <- i + 1
        # cat("httr get request number", i, "\n")
        param_limits <- paste0("limit=", sprintf("%i", LIMIT),
                               "&", "offset=", sprintf("%i", (i - 1) * LIMIT))
        url <- paste0(base_url, "?", param, "&", param_limits)
        result[[i]] <- httr::GET(url, httr::add_headers(token=token))
        result[[i]] <- httr::content(result[[i]])
        result[[i]] <- result[[i]][[2]]
    }
    result <- Reduce(c, result)
    return(result)
}

## This example requires data.table

library(data.table)
## Code example to download 2011-04-01 to 2012-04-01 
## You can modify the url for the dates you need, that should be obvious in the 
## string.
## The stationid is "GHCND" which is a Chicago location, but I don't remember
## which one. Not all stations have the same depth of history, but this one
## has good coverage for modern times. 
## 
## You'll get weather values and weather "attribute" information.  I don't use
## the attribtues, they include things like wind direction and information 
## about the type of percipitation. 
##
## This assumes that you have the token stored in a file called 
## "weather_noaa_token.txt".  So, when you register for the API token, just 
## save the token in a file with nothing else.
## 
dat <- get_noaa_content(base_url = "http://www.ncdc.noaa.gov/cdo-web/api/v2/data",
                        param = paste0("datasetid=GHCND&stationid=GHCND:USW00094846&",
                                       "startdate=2011-04-01&enddate=2012-04-01"),
                        token = readLines("weather_noaa_token.txt"))
dat <- rbindlist(dat)
dat[ , date := as.IDate(date)]
## Reshape the "values" part of the data
dat_values <- dcast(dat, formula = date ~ datatype, value.var = "value",
                    fun.aggregate = c, fill = NA)
## Reshape the "attributes" part of the data
dat_attr <- dcast(dat, formula = date ~ datatype, value.var = "attributes",
                  fun.aggregate = c, fill = NA)
## Prints the head and tail of the "values" then the "attributes"
dat_values
dat_attr
## Some plots to tell that you have something that looks seasonal
dat_values[,plot(TMAX~date, col='red')]
dat_values[,points(TMIN~date, col='blue')]
dat_values[,plot(AWND~date)]

weather <- readRDS("DATA/weather_20110401_20141031.Rds")
weather
dat_joined <- merge(weather, dat_values[ , list(TMAX, PRCP, AWND), date], "date")

## The units are different, but looks similar enough to give it a shot.
opar <- par(mfrow=c(1,2))
dat_joined[ , plot(log(precipIntensity), log(PRCP), main = "precipitation")]
dat_joined[ , plot(temperatureMax, TMAX, main = "temperature")]
par(opar)

image

geneorama commented 7 years ago

@greensam Hopefully this helps. I'm closing the issue unless you have further questions.

greensam commented 7 years ago

@geneorama Thanks for the info. We had found and used NOAA, was only inquiring to see if the weather data was somehow more granular (station level or something like that). I appreciate the response!

geneorama commented 7 years ago

@greensam You can get the data at a more granular level, but we didn't in this project. I just used a downtown zip code.

greensam commented 7 years ago

@geneorama makes sense. Thanks again.