Chicago / census2020_ward_rpt

Census 2020 Report
https://www.chicago.gov/city/en/sites/census2020/home/ward-reports.html
MIT License
2 stars 1 forks source link

Census tracts not found in crosswalk #11

Closed geneorama closed 4 years ago

geneorama commented 4 years ago

@sherryshenker @srao-civis

Do you know why so few of the census tracts are matched to the crosswalk?

Reading in just the tracts and crosswalk:

ex <- read_civis(sql("SELECT tract FROM cic.pdb2019trv3_us"), database = "City of Chicago")
cw <- read.csv("data_census_planning/crosswalk.csv")

Seeing what's in what:

table(ex$tract %in% cw$census_tract)

Yields this table, so the vast majority are not found

FALSE  TRUE 
69599  4425 

The tracts look different:

head(ex$tract)
[1]  20100  20200 981903  20300  20400  20500
> head(cw$census_tract)
[1] 1001 1002 1003 1004 1005 1006

Am I comparing the wrong things?

geneorama commented 4 years ago

I see what's going on; pdb2019trv3_us is for the whole country.

When I look at the overlap between our shape file and the Civis PDB, it makes more sense. table(shp_tracts$TRACTCE %in% civisdata$tract) yields this table:

FALSE  TRUE 
  146   650 

Now my question is why are so many tracts missing?

If I plot the tracts I see a pattern:

plot(shp_tracts)
plot(shp_tracts[!shp_tracts$TRACTCE %in% civisdata$tract, ], add=T,col="red")

image

(@DForbush BTW, you made a comment about "why are we losing rows" at this merge step. This is why. But I'm not sure which rows you meant; the nationwide ones, or the red area)

srao-civis commented 4 years ago

Hi Gene - can you confirm what shapefile (shp_tracts) and Civis PDB (civisdata) you're using to do this merge? That'll help me try to pinpoint what's going on here.

geneorama commented 4 years ago

Sorry for the delay, we just had a fire drill data_maps/tracts.geojson

srao-civis commented 4 years ago

No worries. Is the civisdata the cic.pdb2019trv3_us table or something else?

geneorama commented 4 years ago

@srao-civis yes

You can see the files and table names in the issue by the way

geneorama commented 4 years ago

@srao-civis I take it back, I didn't show the map loading part

geneorama commented 4 years ago

@srao-civis

This should fully reproduce the example, if you have an API key

##------------------------------------------------------------------------------
## INITIALIZATION
##------------------------------------------------------------------------------
library(data.table)

## Steps to read civis data
## For key setup, see https://civisanalytics.github.io/civis-r/
library(civis)

## Set the key using a file, or do it manually here:
if(file.exists("config/setkey.R"){
    source("config/setkey.R")
} else {
    Sys.setenv(CIVIS_API_KEY="INSERT YOUR KEY HERE")
}

##------------------------------------------------------------------------------
## Read from civis & crosswalk
##------------------------------------------------------------------------------
q_cook <- "SELECT tract FROM cic.pdb2019trv3_us WHERE county=31 AND state=17"
q_all <- "SELECT tract, state, county FROM cic.pdb2019trv3_us"
ex_cook <- data.table(read_civis(sql(q_cook), database = "City of Chicago"))
ex_all <- data.table(read_civis(sql(q_all), database = "City of Chicago"))

cw <- fread("data_census_planning/crosswalk.csv")
cw[ , census_tract := census_tract * 100]

##------------------------------------------------------------------------------
## Tables
##------------------------------------------------------------------------------
table(ex_cook$tract %in% cw$census_tract)
table(ex_all$tract %in% cw$census_tract)
ex_all[ex_all$tract %in% cw$census_tract]

head(ex_cook$tract)
head(cw$census_tract)

##------------------------------------------------------------------------------
## Map
##------------------------------------------------------------------------------
plot(shp_tracts)
plot(shp_tracts[!shp_tracts$TRACTCE %in% ex_all$tract, ], add=T, col="red")

plot(shp_tracts)
plot(shp_tracts[!shp_tracts$TRACTCE %in% ex_cook$tract, ], add=T, col="red")
geneorama commented 4 years ago

@srao-civis I think I understand the issue

First, you have to multiply the tract by 100 in the crosswalk in order to match the database. That was a detail that I missed in Dan's code, but he was doing it (thanks Dan!)

The fact that any tracts matched before was a total coincidence, and it was because I was pulling tracts from across the country.

Second; the red area. I'm guessing that these tracts were renamed in my map vs. the Civis map. I'm not sure what year my map is, but it's probably older.

Third, we don't need the crosswalk anymore anyway.

geneorama commented 4 years ago

@srao-civis Actually, I do need this to work, because the tracts show up in the report.

Multiplying by 100 gets me most of the way there, but there are still tracts missing, and I don't know where they are.

Can you please upload a .json shapefile for the census tracts? You can put it right into data_maps. I need it to have some meta data for tract names.

More detail:

Right now there are 802 tracts in the crosswalk, all but 19 are found in the the civis planning database. I'd guess that the ones that are not found are oddballs with no population, like industrial areas.

However, that whole area on the NW side is missing; that is I can't relate that area to a ward based on your crosswalk.

geneorama commented 4 years ago

I downloaded the 2019 shape file from https://www2.census.gov/geo/tiger/TIGER2019/TRACT/ and I'm getting the same results, so it's not the age of my shapefile.

sherryshenker commented 4 years ago

@geneorama I think I might have found the issue. The column you are using in the GEOJSON to join (TRACTCE) is a string that has leading zeros for some tracts. The column in the Civis file is an integer without leading zeros. When I add in shp_tracts$tract <- as.numeric(shp_tracts$TRACTCE) and use that, I'm not seeing any missing tracts. Let me know if that fixes things for you!

geneorama commented 4 years ago

Converting to six digits with leading zeros completely solves the problem

I can never remember how to do the sprintf without looking it up, but this took care of the matching:
civis_pdb[ , tract := sprintf("%06i", as.integer(tract))]