Final project for Applied Data Science, a Fall 2016 course at NYU's Center for Urban Science + Progress taught by Dr. Stanislav Sobolevsky.
This project attempts to quantify the value of public space by investigating whether the quality of public space around a Citibike docking station increases ridership at that station controlling for other possible factors. The quality of public space is quantified this project using a combination of data, including the quality and traffic volumne on the street, the presence of bike lanes, the nearby presence of parks or subway entrances, and the number and quality of trees in the area. Controlling factors include median houshold income per census tract and population density. The project concludes that, though ridership at Citibike docking stations are not a good way to quantify people's preferences based on public space, the methodology used to develop the model, given a different dependent variable (such as urban sensing data) could prove to be useful for both public agencies and private companies.
Copy the .env example
cp .env-example .env
Edit .env
, adding a Google Geocoding API Key that can be acquired here.
To download the raw, external data, first run in the root directory of this project:
make download_data
This will create a collection of directories under data/
that will store all
external and processed data. This may take a few minutes depending on the
speed of your internet.
Run through all notebooks in the notebooks
folder in the order of their prefix.
This will generate data in the /data/processed
and /data/map
folders to be used for analysis.
Naming convention for all notebooks in the project: order number, the creator's initials, and a short
-
delimited description, e.g.1.0-jp-initial-data-exploration
.
Run through all notebooks in the models
folder. These notebooks will run various regressions, outputting
our findings.
├── LICENSE
├── Makefile <- Makefile with commands like `make download_data`
├── README.md <- The top-level README for developers using this project.
├── data
│ ├── external <- Data from third party sources.
│ ├── interim <- Intermediate data that has been transformed.
│ ├── processed <- The final, canonical data sets for modeling.
│ └── map <- Data to be used for mapping and visualizations.
├── notebooks <- Data processing Jupyter notebooks.
├── models <- Regression and analysis Jupyter notebooks.
├── references <- Data dictionaries, manuals, and all other explanatory materials.
├── reports <- Generated analysis as HTML, PDF, LaTeX, etc.
│ └── figures <- Generated graphics and figures to be used in reporting
├── requirements.txt <- The requirements file for reproducing the analysis environment, e.g.
│ generated with `pip freeze > requirements.txt`
└── scripts <- Source code for use in this project.
Data | Source |
---|---|
Citi Bike Docking Stations | Citi Bike System Data |
Street Assessment Ratings | NYC Dept of Transportation |
Parks | NYC Open Data |
Subway Entrances | NYC Open Data |
Bike Lanes | NYC Dept of Transportation |
Tree Canopy | NYC Open Data |
Traffic Volume | NY State Dept of Transportation |
Citi Bike Ridership | Citi Bike System Data |
Median Household Income | United States Census |
Population Density | NYC Open Data |