Data4Democracy / house_expenditures

18 stars 10 forks source link

Clean chr vars category #15

Closed supermdat closed 7 years ago

supermdat commented 7 years ago

This branch is the clean/standardize the variable "category."

Specifically, the Jaro–Winkler distance is calculated for each pair of category variations. Then those pairs having a small distance are inspected. For "category" specifically, there are no variations with small distances.