Getting limited clusters and few records missed out while scoring

dedupeio / dedupe-examples

:id: Examples for using the dedupe library

MIT License

404 stars 216 forks source link

Getting limited clusters and few records missed out while scoring #94

Closed bharath-ts closed 4 years ago

bharath-ts commented 5 years ago

Hi, I m using dedupe (version 1.6.5 in windows) to identify duplicate records in a data. I am loading trained model and scoring on new set of data of 20000 records. I know there are 10000 duplicates in it. But the model is throwing only 6000 clusters which is 12000 records only. The remaining records are not getting mapped to any clusters even though they have duplicates. Please suggest a solution

fgregg commented 4 years ago

not enough to detail