dedupeio / dedupe

:id: A python library for accurate and scalable fuzzy matching, record deduplication and entity-resolution.
https://docs.dedupe.io
MIT License
4.15k stars 551 forks source link

Update an existing model #1214

Open havardox opened 2 days ago

havardox commented 2 days ago

Continuing the discussion from #672 which was closed for some reason although it is still a standing issue. Are there any plans to add a way to update an existing model while adding new training data instead of retraining the entire model from scratch?

So

linker = dedupe.RecordLink(fields)
linker.prepare_training(messy_data, canonical_data, training_file=tf)

Would become

linker = dedupe.RecordLink(fields)
linker.prepare_training(messy_data, canonical_data, settings_file=sf, training_file=tf)