Closed csethna closed 6 years ago
In this one-time edition of Zombie Datasets
, we examined two under utilized City of Chicago datasets.
We determined that Chicago Public Library data is underutilized, not because it isn't useful, but because the frequency of updates-- monthly. We also determined that the dataset of movies is very small each year and due to the availability of the movies from other outlets, the use case for individual datasets could be perceived as limited.
frequency of computer sessions per branch
could determine which branches get more traffic, but is also influenced by other factors such as location, square footage, and availability of public access terminals. A more useful way of examining this data is aggregating it over the course of several years and determining the statistically "busiest" months of the year for computer use. Fields exist to enable determination of peak computer use times by branch. This information is relevant in determining whether or not additional funding should be allocated for more terminals (seeing average capacity compared against maximum capacity). It could also be useful in determining when are the best/ worst times to update software or hardware which might inconvenience the fewest number of patrons.number of holds placed per branch
and the number of holds fulfilled per branch
. The availability of this data allows us to determine which branches have the highest "flake out" rate among patrons. Using GIS, it is possible to search for a geographic pattern in regards to flakiness, though other factors could contribute to the phenomenon.
The City of Chicago has considerable "zombie data." However, our group was able to determine that even the most underutilized data is not completely useless. In the future, "rehabilitation" of these lesser-accessed datasets could be an interesting project which raises awareness of the breadth of information available on the data portal and showcases the powerful applications and inferences able to be drawn from the application of analysis.
The way I think of zombie datasets is looking at which datasets haven't been updated in a while, and not which ones were accessed the least.
Great idea!
Cyrus Sethna about.csethna.com
On Nov 9, 2017, 3:33 PM -0600, Steven Vance notifications@github.com, wrote:
The way I think of zombie datasets is looking at which datasets haven't been updated in a while, and not which ones were accessed the least. — You are receiving this because you modified the open/close state. Reply to this email directly, view it on GitHub, or mute the thread.
Happy Halloween. Welcome to Zombie Datasets.
The goal of this breakout group is to identify “walking dead” datasets, defined as: “publicly available datasets which have been accessed least recently with relation to their peer datasets from the same source.”
In this exercise, we will be playing Dr. Robert Neville and treating these datasets as hosts infected with the Krippin Virus. This means, unlike zombies from the Robert Kirkman (Walking Dead) or Max Brooks (World War Z) universes, these “zombie” datasets are capable of being rehabilitated, with the right use case.
Upon identification of a zombie dataset, the team will work quickly to administer GA-series Serum 391, Compound 6. This means devising a use case for which the dataset could provide meaningful insights and, time allowing, creating MVP in order to demonstrate a potential application for the data.
Group leaders
@csethna
on Slack.#zombie-data
.Tools
Pandas
for this project.GitHub
Chi Hack Night slack channel