Open JoeGermuska opened 10 years ago
This came up recently with a UserVoice comment about the data exports, but it remains an issue for that community analyzing Block Group data.
Proposed update, based on this example:
Tract 8067, Block Group 1, Cook County, IL
A bit harder than it looks because the name is set via SQL query against the TIGER BG table, which has only the six "digit" tractce
value. It would take clever PostgreSQL, a stored procedure, or a second pass on the data to deal with leading 0 and the fact that, in print, the last two characters of tractce
appear after a decimal point, and only if they are non-zero.
We could join with tract
and either use NAME
or NAMELSAD
, and some simple PostgreSQL string manipulation.
Alternatively, this stored procedure produces the exact same value as NAME
without needing to join:
CREATE FUNCTION format_tract(tract_ce text)
RETURNS text
AS $$
if tract_ce.endswith('00'):
return tract_ce.lstrip('0')[:-2]
return tract_ce[:4].lstrip('0') + '.' + tract_ce[-2:]
$$ LANGUAGE plpythonu;
@iandees do you have an opinion? Do you think we could close this with the December ACS 2015-2019 5-year data load?
I'd rather not use stored procedures as it makes it harder to maintain. If we want to run Python we can just run it over the CSV before we load it into Postgres.
On UserVoice, a user pointed out that block group numbers are not unique within counties. Therefore, our names are not unique (see screen shot).
Ideally as part of loading the ACS 2013-5 year data, we should change the name construction to include the Census tract number.