chalobest / ChaloBEST

Social Transport and Sustainable Mobility in Greater Mumbai
http://dev.chalobest.in
10 stars 15 forks source link

Determine thresholds and similarity measures for string matching #14

Open shekhark opened 11 years ago

subhodip commented 11 years ago

More details on this please.

batpad commented 11 years ago

We're using the TrigramSearch functionality in Postgres. The function accepts a "thresshold" parameter to tweak what threshhold of similarity it should return results for. I have no idea what "Determine thresholds and similarity measures for string matching" means exactly - but in general, it may be good to test the current fuzzy matching for names, etc. with different threshhold values and examples, and see what tweaks could be made to the Trigram String matching to improve it. Definitely something for later, not now, though.

shekhark commented 11 years ago

Two issues here: how results are sorted in a trigram search, and tweaking threshold values to improve search. When user enters a nearly full stop name i.e. "dadar workshop" it should not also concantenate "parel workshop" after. What is the sorting order? If by threshold match, then tweak thresholds and test for best results.