issues
search
Data4Democracy
/
house_expenditures
18
stars
10
forks
source link
Clean chr vars office
#16
Closed
supermdat
closed
7 years ago
supermdat
commented
7 years ago
Cleaning and standardization of the variable "office."
Using the Jaro-Winkler distance to calculate distances between unique entries in the "office" variable.
Looking particularly at distance below 0.1 (this value was arbitrarily chosen).
Then, creating a lookup table for correcting misspellings.
Then, updating the main table with the corrected spellings.
Cleaning and standardization of the variable "office."