egovernments / Digit-Core

DIGIT is an open source modular Micro-services multi-tenant platform for public service delivery.
https://core.digit.org
MIT License
15 stars 50 forks source link

Property <> W&S matching for migration - Fuzzy matching logic exploring #16

Open gajenc opened 2 years ago

gajenc commented 2 years ago
gajenc commented 2 years ago

Still POC is in progress

Sarvesh-eGov commented 2 years ago

Concept used : Weighted Fuzzy matching

  1. Fuzzy matching between WnS ‘houseno’ and digit ‘doorno’, assigned the highest weight, as this door number is unique to a house in a certain locality.

  2. Fuzzy matching between WnS ‘corraddress’ and digit ‘region’ and ‘street’ combination. This was assigned a low weightage, as street and region would be common to a lot of properties.

  3. Locality code and pin code exact matching between WnS and Digit tables. No fuzzy match here, as locality code and pin code need to match exactly.

Sarvesh-eGov commented 2 years ago

Approach : An overall score was calculated based on the fuzzy matching ratio of house numbers, door numbers, regions, and street names. If this overall score was above a certain threshold for a property, then this property was taken as a match with the record in the Wns table. Following are the metrics for various thresholds

Sarvesh-eGov commented 2 years ago

Selected a sample of 100 records from the WnS table.

Threshold = 80 Matches found = 58 Matches containing houseno and doorno = 46 Match ratio = 58% Accuracy = 46/58 = 79.3%

Threshold = 60 Matches found = 72 Matches containing houseno and doorno = 50 Match ratio = 72% Accuracy = 50/72 = 69.4%

Threshold = 50 Matches found = 76 Matches containing houseno and doorno = 51 Match ratio = 76% Accuracy = 51/76= 67.1%

However, the approach and the accuracy criteria are at primitive stages, and needs further development. Will get a better clarity if we have the user PII data from digit