Data4Democracy / house_expenditures

18 stars 10 forks source link

Function for calculating string distances/similarities BY FIRST CHARACTER #26

Closed supermdat closed 7 years ago

supermdat commented 7 years ago

This function is identical to the function func_dist except that func_dist_ByFrstChr does separate distance calculations for those words that begin with the same character. For example, doing the calculation only for words beginning with "a", then "b", then "c", etc. This is done because some variables have a large amount of unique text entries, and the computational resources needed to do the calculations on the entire dataset were too large for my laptop.

NOTE: This makes the implicit assumption that words that do not begin with the same letter are not related.