RMI-PACTA / 2dii-DataWarehouse

1 stars 1 forks source link

Jaro-Winkler Matching function #8

Open AlexAxthelm opened 4 years ago

AlexAxthelm commented 4 years ago

We need a function in the database to provide the Jaro-Winkler distance between two stings, since this is the algorithm we already use. We will likely need to provide an implementation of it ourselves, since the postgres extension pg_similarity has limited support on cloud hosting providers.

evan-2deg commented 4 years ago

Weren't there alternative extensions?

evan-2deg commented 4 years ago

One option is to take the Jaro-Winkler aspect of the extension and put it into a function. Alex will be looking into this

evan-2deg commented 4 years ago

http://alga42.blogspot.com/2016/05/jaro-winkler-in-plpgsql.html

Seems to be that forcing the characters to all be lowercase would make the function work.

Levenshtain matching is also now built into postgres.