dedupeio / dedupe-examples

:id: Examples for using the dedupe library
MIT License
404 stars 216 forks source link

Best dataset to use for training/learning #113

Closed jennguo9 closed 2 years ago

jennguo9 commented 4 years ago

hi all,

Is there a dedupe_dataframe_training.json and dedupe_dataframe_learned_settings that is already proven to be pretty accurate? When it's asking me to label the data, it just depends on what options shows up first. For example if the first prompt is if john,smith, johnsmith@email.com matches with john.smith, johnsmith@gmail.com. and that's a Yes. And, the next prompt will ask me if john, Appleseed, johnappleseed@email.com, 1234567890 matches with john, Appleseed, johnappleseed@email.com, 1234567890 and the answer is yes. The results would show that john, smith had a higher confidence score than the record that matched on all 4 attributes.
Which is not correct. Please help. Thank you in advance.