campus-crime-watch / campus-crime-watch.github.io

A journalism project: an interactive map of crime data for Stanford University and information about the Clery Act.
https://campus-crime-watch.github.io/
MIT License
0 stars 0 forks source link
crime-data data-analysis news newsfeed

How We Created Campus Crimewatch

We are not actively maintaining this site. Pull requests are not accepted. Please fork this repo and create your own site!

This project is protected by copyright. See our LICENSE.

The data on Campus Crimewatch consists of daily crime incident reports from from January 1, 2019 to April 18, 2023 at Stanford University.

Below, we'll walk you through how we developed the different components of Campus Crimewatch and its web page. You can use our guide to create this web app for your college campus, too!

Table of Contents

Installing Python libraries

Campus Crimewatch was created using Python, HTML, CSS, and JavaScript. If you're new to Python, here's some helpful guides that will help you nail the basics needed to execute this project.

Libraries that we use in this project:

Getting Started

Spend some time reading through our files/scripts which have comments describing the purpose of each code block and how you can personalize our code for your specific dataset.

After you've obtained the daily crime log dataset from your university, make sure to drop it in the data/raw folder to get started.

Data Cleaning & Analysis

Cleaning your dataset will be the step you should spend the most time and headaches on. We recommend using a Jupyter Notebook to play with the data and see the gaps or inconsistencies that you would need to solve.

This is a good time to check and possibly fix the data types. You want your date values and crime/incident descriptions to be strings for ease of display on the website. Column headers should be all lowercase and snakecase.

You can create a data pipeline that will: 1: grab crime data 2: clean the data for inconsistencies & standardize the date column

  1. geocode the locations so you can display exact locations on the map
  2. standardize the crime categories so that you can show summary statistics for each type of crime
  3. create sentences for these summary statistics
  4. export all this data to a geojson file for the interactive map

How the data pipeline works

Files & Directories

Below is an overview of the project structure:

├── Pipfile
├── Pipfile.lock
├── README.md
├── data
│   ├── processed (Raw data that has been transformed)
        ├── e.g. daily_crime_clean.csv
        └── ready_for_json.csv
│   └── raw (Copy of original source data)
        └── e.g. daily_crime_raw.pdf
├── docs (All the files that generate the web app - HTML, CSS, JavaScript)
    └── data (json files full of data used by JavaScript files)
        ├── news_feed.json
        ├── stat_sentences.json
        └── crime.geojson
    ├── index.html
    ├── about_page.html
    ├── clery_act.html
    ├── main_page.css
    ├── data_viz.js
    ├── histogram.js
    ├── map.js
    ├── news_ticker.js
    └── sentences.js
├── notebooks (Jupyter notebooks checking the quality of our dataset)
    └── data_quality.ipynb
├── scripts (Number-prefixed data processing scripts)
│   ├── extract.py
    ├── pre_process.py
    ├── clean_geocode.py
    ├── crime_category.py
    ├── csv_to_geojson.py
    ├── feed.py
    └── run_pipeline.py

Making The Web App Go Live

We found it the easiest to host the web app through GitHub Pages so that the app is hosted directly from the existing GitHub repository, reflecting the latest changes and commits.

For a step-by-step guide and more information, please consult the official documentation. The following are some crucial points for the app to go live:

Building The Map

The map is created using Mapbox GL JS, a JavaScript library for customizable interactive web map.

For more instructions on installation, please refer to the quick start guide. You will need a Mapbox account and a unique Mapbox access token to get started.

How to display the crime data on the map:

For more information, see Mapbox GL JS guides for detailed documentation of options.

Creating The News Ticker

The News Ticker displays as a black strip on top of the site if you customize the process for accquiring crime related news for your campus. A good place to look for pre-exisitng news feeds would be your school paper, or a local news outlet that covers crime on or near campus. Its possible that you want to pull data in from multiple sources and combine them into a single news feed.

All this is possible by customizing scripts/feed.py. This script can be run using GitHub Actions (GitHub Actions documention) via the .github/workflows/feed.yml file.

For example, Stanford has a daily police blotter and sometimes has 'Crime & Safety' articles. The Stanford news ticker checks the Stanford Daily's news feed every 2 hours and updates the ticker if there are new relevant articles.

How it works:

Make your own:

Building The Histogram

The histogram uses d3.js to shows crime counts by year and month. It relies on the fact that the .geojson data has date related inforamtion to help sort.

How it works:

Creating The Summary Statistic Sentences

These sentences are meant to give a quick overall view of crime on Stanford's campus from year to year. It displays the crime category from each year that had the highest number of reported crimes.

How it works:

You can modify the code in create_sentences() to display different aspects of the data. For instance, if you have 5+ years of data, you can display the percent increase or increase in reported crime over the years, the number of active cases, the locations with the highest reported crimes, etc.

Github Actions

You can automate both the data pipeline and news ticker using Github Actions. You can set a frequency for your code to run and you don't have to manually run it yourself. This is how our news ticker automatically updates itself when new articles are posted!

Check out .github/workflows to see our automation of the map (map_data.yml) and news ticker (feed.yml`).

Disclaimers

Remember that our code is made to fit the structure of Stanford's daily crime dataset. If your dataset is formatted differently (e.g. you need to make modifications to pdfplumber because the format is not standardized, there's extra or missing columns in your data, etc.), you should be mindful to modify our code to account for the differences in your dataset.

There's quite a bit of manual labor to be done in the data cleaning process (e.g. scanning the dataset to make flags for the crime categories). Do not skip out on this work -- it's important for ensuring your work is accurate.

If your school does not have an RSS feed or newspaper that reports on crime and safety you can find another way to incorporate the news ticker. Perhaps you could pull from Twitter conversations about public safety on your campus.

There's an option to include a tip button where students can submit tips or comments on public safety on your campus. Perhaps this feature could be a moderated discussion board on the site.

If you have to manually ask for data every six months or year, you cannot automatically update the dataset every few weeks since there's nothing it can automatically pull from. If you do have an API you can pull from, automating the dataset's updates is a crucial feature that you should implement.