dssg / pgdedupe

A simple command line interface to the datamade/dedupe library.
https://pgdedupe.readthedocs.io
Other
42 stars 6 forks source link

Store labeled examples in a table #63

Open ecsalomon opened 7 years ago

ecsalomon commented 7 years ago

Labeled training example pairs should be stored in a table for selection and reuse. Data stored for examples should include:

Storing examples like this allows them to by reused in the following ways:

This will also entail a test that any model is only trained on one label per pair!