Best dataset to use for training/learning

hi all,

Is there a dedupe_dataframe_training.json and dedupe_dataframe_learned_settings that is already proven to be pretty accurate? When it's asking me to label the data, it just depends on what options shows up first. For example if the first prompt is if john,smith, johnsmith@email.com matches with john.smith, johnsmith@gmail.com. and that's a Yes. And, the next prompt will ask me if john, Appleseed, johnappleseed@email.com, 1234567890 matches with john, Appleseed, johnappleseed@email.com, 1234567890 and the answer is yes. The results would show that john, smith had a higher confidence score than the record that matched on all 4 attributes.
Which is not correct. Please help. Thank you in advance.

dedupeio / dedupe-examples

Best dataset to use for training/learning #113