CityOfLosAngeles / covid19-indicators

Key COVID-19 and public health indicators for reopening
Apache License 2.0
8 stars 1 forks source link

covid19-indicators

Key COVID-19 and public health indicators for reopening

Project Organization

├── LICENSE
├── Makefile                 <- Makefile with commands like `make data` or `make train`
├── README.md                <- The top-level README for developers using this project.
├── Dockerfile               <- Docker image for this project.
├── data                     <- Scripts to create the data and CSVs.
├── catalog                  <- Catalog listing data sources used.
├── notebooks                <- Jupyter notebooks.
├── conda-requirements.txt   <- The requirements file for conda installs.
├── requirements.txt         <- The requirements file for reproducing the analysis environment,
│                               e.g generated with `pip freeze > requirements.txt`
├── main.py                  <- Used to send our daily pdf reports by email. 
├── report.py                <- Used to automate writing the daily report on GitHub pages. 
├── report_county_trends.py  <- Used to automate writing the daily report on GitHub pages. 
├── setup.py                 <- Makes project pip installable (pip install -e .) 

This repository will track COVID-19 indicators as LA considers its reopening strategy. We will also provide sample notebooks for how others can use the Johns Hopkins University COVID-19 data, which is available for all US counties, to look at trends in other counties or states. Related repo: https://github.com/CityOfLosAngeles/covid19-rmarkdown

LA COUNTY DETAILED DAILY REPORT: https://cityoflosangeles.github.io/covid19-indicators/coronavirus-stats.html

CA COUNTIES REPORT: https://tinyurl.com/cacovidtrends

OTHER MAJOR US COUNTIES REPORT: https://tinyurl.com/uscountycovidtrends

LA COUNTY NEIGHBORHOODS REPORT: https://tinyurl.com/laneighborhoodcovidtrends

The City of LA uses US county data published by JHU. The historical time-series is pulled from JHU's CSV on GitHub and appended with the current date's data from the ESRI feature layer.

Our data sources are public and smaller files are in the data folder.

  1. Data Sources: Cases, Hospital and Testing
  2. Helpful Hints for Jupyter Notebooks
  3. Setting up a Conda Environment
  4. Starting with Docker
  5. Emailing the Report

Data Sources

Scripts to ingest, process, and save our data sources are in the data folder. Use the helpful hints to access the data.

COVID-19 Cases

Hospital Data

COVID-19 Testing

Helpful Hints

Jupyter Notebooks can read in both the ESRI feature layer and the CSV.

Ex: JHU global province-level time-series feature layer and CSV

Import the CSV

All you need is the item ID of the CSV item. We use an f-string to construct the URL and use Python pandas package to import the CSV.

JHU_GLOBAL_ITEM_ID = "daeef8efe43941748cb98d7c1f716122"

JHU_URL = f"http://lahub.maps.arcgis.com/sharing/rest/content/items/{JHU_GLOBAL_ITEM_ID}/data"

TESTING_URL = (
    "https://raw.githubusercontent.com/CityOfLosAngeles/covid19-indicators"
    "master/data/county-city-testing.csv"
)

import pandas as pd

df = pd.read_csv(JHU_URL)
df = pd.read_csv(TESTING_URL)

Import from data catalog

import intake
import pandas as pd

catalog = intake.open_catalog("../catalog.yml")

# See files are inside catalog
list(catalog)

# To open a file called hospital_surge_capacity:
df = catalog.ca_hospital_surge_capacity.read()

Import ESRI feature layer

FEATURE_LAYER_URL = "http://lahub.maps.arcgis.com/home/item.html?id=20271474d3c3404d9c79bed0dbd48580"

SERVICE_URL = "https://services5.arcgis.com/7nsPwEMP38bSkCjy/arcgis/rest/services/jhu_covid19_time_series/FeatureServer/0"

CORRECT_URL = "https://services5.arcgis.com/7nsPwEMP38bSkCjy/ArcGIS/rest/services/jhu_covid19_time_series/FeatureServer/0/query?where=1%3D1&objectIds=&time=&geometry=&geometryType=esriGeometryEnvelope&inSR=&spatialRel=esriSpatialRelIntersects&resultType=none&distance=0.0&units=esriSRUnit_Meter&returnGeodetic=false&outFields=Province_State%2C+Country_Region%2C+Lat%2C+Long%2C+date%2C+number_of_cases%2C+number_of_deaths%2C+number_of_recovered%2C+ObjectId&returnGeometry=true&featureEncoding=esriDefault&multipatchOption=xyFootprint&maxAllowableOffset=&geometryPrecision=&outSR=&datumTransformation=&applyVCSProjection=false&returnIdsOnly=false&returnUniqueIdsOnly=false&returnCountOnly=false&returnExtentOnly=false&returnQueryGeometry=false&returnDistinctValues=false&cacheHint=false&orderByFields=&groupByFieldsForStatistics=&outStatistics=&having=&resultOffset=&resultRecordCount=&returnZ=false&returnM=false&returnExceededLimitFeatures=true&quantizationParameters=&sqlFormat=none&f=pgeojson&token="

import geopandas as gpd
gdf = gpd.read_file(CORRECT_URL)

To convert to HTML: jupyter nbconvert --to html --no-input --no-prompt my-notebook.ipynb

Setting up a Conda Environment

  1. conda create --name my_project_name
  2. source activate my_project_name
  3. conda install --file conda-requirements.txt -c conda-forge
  4. pip install requirements.txt

Starting with Docker

  1. Start with Steps 1-2 above
  2. Build Docker container: docker-compose.exe build
  3. Start Docker container docker-compose.exe up
  4. Open Jupyter Lab notebook by typing localhost:8888/lab/ in the browser.

Project based on the cookiecutter data science project template. #cookiecutterdatascience

Emailing the Report

To setup the report for daily emailing, you'll need to have AWS SES configured and setup on your account.

  1. docker-compose build
  2. docker-compose run lab python /app/main.py

Pushing to Socrata Open Data Portal

A set of datasets are also published to data/socrata.py.