Closed geneorama closed 4 years ago
I see what's going on; pdb2019trv3_us
is for the whole country.
When I look at the overlap between our shape file and the Civis PDB, it makes more sense.
table(shp_tracts$TRACTCE %in% civisdata$tract)
yields this table:
FALSE TRUE
146 650
Now my question is why are so many tracts missing?
If I plot the tracts I see a pattern:
plot(shp_tracts)
plot(shp_tracts[!shp_tracts$TRACTCE %in% civisdata$tract, ], add=T,col="red")
(@DForbush BTW, you made a comment about "why are we losing rows" at this merge step. This is why. But I'm not sure which rows you meant; the nationwide ones, or the red area)
Hi Gene - can you confirm what shapefile (shp_tracts) and Civis PDB (civisdata) you're using to do this merge? That'll help me try to pinpoint what's going on here.
Sorry for the delay, we just had a fire drill data_maps/tracts.geojson
No worries. Is the civisdata the cic.pdb2019trv3_us table or something else?
@srao-civis yes
You can see the files and table names in the issue by the way
@srao-civis I take it back, I didn't show the map loading part
@srao-civis
This should fully reproduce the example, if you have an API key
##------------------------------------------------------------------------------
## INITIALIZATION
##------------------------------------------------------------------------------
library(data.table)
## Steps to read civis data
## For key setup, see https://civisanalytics.github.io/civis-r/
library(civis)
## Set the key using a file, or do it manually here:
if(file.exists("config/setkey.R"){
source("config/setkey.R")
} else {
Sys.setenv(CIVIS_API_KEY="INSERT YOUR KEY HERE")
}
##------------------------------------------------------------------------------
## Read from civis & crosswalk
##------------------------------------------------------------------------------
q_cook <- "SELECT tract FROM cic.pdb2019trv3_us WHERE county=31 AND state=17"
q_all <- "SELECT tract, state, county FROM cic.pdb2019trv3_us"
ex_cook <- data.table(read_civis(sql(q_cook), database = "City of Chicago"))
ex_all <- data.table(read_civis(sql(q_all), database = "City of Chicago"))
cw <- fread("data_census_planning/crosswalk.csv")
cw[ , census_tract := census_tract * 100]
##------------------------------------------------------------------------------
## Tables
##------------------------------------------------------------------------------
table(ex_cook$tract %in% cw$census_tract)
table(ex_all$tract %in% cw$census_tract)
ex_all[ex_all$tract %in% cw$census_tract]
head(ex_cook$tract)
head(cw$census_tract)
##------------------------------------------------------------------------------
## Map
##------------------------------------------------------------------------------
plot(shp_tracts)
plot(shp_tracts[!shp_tracts$TRACTCE %in% ex_all$tract, ], add=T, col="red")
plot(shp_tracts)
plot(shp_tracts[!shp_tracts$TRACTCE %in% ex_cook$tract, ], add=T, col="red")
@srao-civis I think I understand the issue
First, you have to multiply the tract by 100 in the crosswalk in order to match the database. That was a detail that I missed in Dan's code, but he was doing it (thanks Dan!)
The fact that any tracts matched before was a total coincidence, and it was because I was pulling tracts from across the country.
Second; the red area. I'm guessing that these tracts were renamed in my map vs. the Civis map. I'm not sure what year my map is, but it's probably older.
Third, we don't need the crosswalk anymore anyway.
@srao-civis Actually, I do need this to work, because the tracts show up in the report.
Multiplying by 100 gets me most of the way there, but there are still tracts missing, and I don't know where they are.
Can you please upload a .json shapefile for the census tracts? You can put it right into data_maps
. I need it to have some meta data for tract names.
More detail:
Right now there are 802 tracts in the crosswalk, all but 19 are found in the the civis planning database. I'd guess that the ones that are not found are oddballs with no population, like industrial areas.
However, that whole area on the NW side is missing; that is I can't relate that area to a ward based on your crosswalk.
I downloaded the 2019 shape file from https://www2.census.gov/geo/tiger/TIGER2019/TRACT/ and I'm getting the same results, so it's not the age of my shapefile.
@geneorama I think I might have found the issue. The column you are using in the GEOJSON to join (TRACTCE) is a string that has leading zeros for some tracts. The column in the Civis file is an integer without leading zeros. When I add in shp_tracts$tract <- as.numeric(shp_tracts$TRACTCE)
and use that, I'm not seeing any missing tracts. Let me know if that fixes things for you!
Converting to six digits with leading zeros completely solves the problem
I can never remember how to do the sprintf without looking it up, but this took care of the matching:
civis_pdb[ , tract := sprintf("%06i", as.integer(tract))]
@sherryshenker @srao-civis
Do you know why so few of the census tracts are matched to the crosswalk?
Reading in just the tracts and crosswalk:
Seeing what's in what:
Yields this table, so the vast majority are not found
The tracts look different:
Am I comparing the wrong things?