Closed martinjbraun closed 2 years ago
Thanks for flagging. I will fix in the next release.
Here's what's causing the problem, in case you're interested: Under the hood getcensus has a list of the geography variables that are returned with a given geography()
, and it avoids destringing those variables. I see that tract was omitted from that list for geography(bg)
. Once it's added, getcensus will no longer destring tract. The str3 formatting of blockgroup is residue from how getcensus imports and cleans up the API response. I think the best way for me to fix it is to add a simple compress
at the end of that process.
Is your feature request related to a problem? Please describe. When importing block group data using
getcensus
, the variablestract
andblockgroup
are not formatted correctly. The variabletract
is stored aslong
and should bestr6
. Practically, this is a problem as many Census tracts have leading 0's which are lost whentract
stored aslong
instead of astring
. Census block groups are always 1 digit, butblockgroup
gets stored asstr3
after runninggetcensus
. There is no loss of information for block groups, but there is no reasonblockgroup
shouldn't bestr1
.This would make it easy to generate, for example, 12 digit geoids for each block group
gen geo12=state+county+tract+blockgroup
without any errors due totract
orblockgroup
being formatted incorrectly.Describe the solution you'd like It would be great if geographic variables were stored as strings after running
getcensus
so the formatting matched the formatting from the census https://www.census.gov/programs-surveys/geography/guidance/geo-identifiers.html For example, it would be great if the following geographies always had these formats after runninggetcensus
:state str2
county str3
tract str6
blockgroup str1
I haven't checked other geographic levels, like zip code, but it would also be great if these had the correct formats after importing them. Having standardized formatting would help ensure users don't make mistakes later
Describe alternatives you've considered I have written my own code to reformat the variables after the fact, but I don't know if it works in all cases.
Additional context Consider the following example
getcensus B25075_001, sample(5) years(2015) geography(bg) statefips(37) countyfips(183) clear
If you then run
tab tract
you'll see that some are 5 digits and some are 6 digits. By runningdescribe tract
we see that this is becausetract
is stored aslong
instead ofstr6
.If you run
describe blockgroup
you'll seeblockgroup
is stored as astr3
even though census blockgroups are always 1 digit.