Open lecy opened 4 years ago
I'm a bit confused by these instructions:
st.fips <- state + 10000
st.fips <- substr( st.fips, 4, 5 ) # extract last two numbers
ct.fips <- county + 10000
ct.fips <- substr( ct.fips, 3, 5 ) # extract last three numbers
county.fips <- paste0( st.fips, ct.fips )
If my FIPS is: 17-031-010100, would the following code be correct?
st.fips <- 17 + 10000
st.fips <- substr( st.fips, 1, 7 ) # extract last two numbers
ct.fips <- 031 + 10000
ct.fips <- substr( ct.fips, 3, 1 ) # extract last three numbers
fips <- paste0( st.fips, ct.fips )
> # this doesn't work
> st.fips <- 17 + 10000
> substr( st.fips, 1, 7 )
[1] "10017"
>
> # substring extracts part of a string
> # start = first position in the string
> # stop = last position in the string
> args( substr )
function (x, start, stop)
NULL
>
> x <- "aloysius snuffleupagus"
> substr( x, 10, 16 )
[1] "snuffle"
>
> # leading zeros problem
> state <- 01
> county <- 030
> tract <- 999911
> paste( state, county, tract, sep="-" )
[1] "1-30-999911"
>
> # should be 01-030-999911
>
> # convert numeric vectors to character
> # with the leading zeros intact
>
> substr( state+10000, start=4, stop=5 )
[1] "01"
> substr( county+10000, start=3, stop=5 )
[1] "030"
>
> s.fixed <- substr( state+10000, start=4, stop=5 )
> c.fixed <- substr( county+10000, start=3, stop=5 )
> t.fixed <- substr( tract+100000000, start=4, stop=9 )
>
> paste( s.fixed, c.fixed, t.fixed, sep="-" )
[1] "01-030-999911"
@sunaynagoel
Fixing FIPS codes to avoid leading zero problems.
You can easily parse this ID to extract state, county, and tract when needed.
Note that state FIPS is actually 2 digits, not 3.
You could create other levels by adding a leading character. This format perhaps?
s-## c-### t-######
If you convert a number to a character to solve the problem, then write the dataset to a CSV and re-load it the vector will be converted back to a number.
If there is a leading character you will never have the leading zeros problem. But you might need to remove the leading codes before combining into a unified FIPS.
The biggest issue is when someone creates a unified FIPS without resolving leading zeros:
Now if a FIPS code is less than 11 digits long you have no idea which zero was missing and you completely break the integrity of your data for a subset of observations. That's the type of problem you can't undo if you don't have the original sub-components of the IDs.