Open gajenc opened 2 years ago
Still POC is in progress
Concept used : Weighted Fuzzy matching
Fuzzy matching between WnS ‘houseno’ and digit ‘doorno’, assigned the highest weight, as this door number is unique to a house in a certain locality.
Fuzzy matching between WnS ‘corraddress’ and digit ‘region’ and ‘street’ combination. This was assigned a low weightage, as street and region would be common to a lot of properties.
Locality code and pin code exact matching between WnS and Digit tables. No fuzzy match here, as locality code and pin code need to match exactly.
Approach : An overall score was calculated based on the fuzzy matching ratio of house numbers, door numbers, regions, and street names. If this overall score was above a certain threshold for a property, then this property was taken as a match with the record in the Wns table. Following are the metrics for various thresholds
Selected a sample of 100 records from the WnS table.
Threshold = 80 Matches found = 58 Matches containing houseno and doorno = 46 Match ratio = 58% Accuracy = 46/58 = 79.3%
Threshold = 60 Matches found = 72 Matches containing houseno and doorno = 50 Match ratio = 72% Accuracy = 50/72 = 69.4%
Threshold = 50 Matches found = 76 Matches containing houseno and doorno = 51 Match ratio = 76% Accuracy = 51/76= 67.1%
However, the approach and the accuracy criteria are at primitive stages, and needs further development. Will get a better clarity if we have the user PII data from digit