[ STUB ] Dedupe core R&D

I'm going to write up a more detailed issue about this on Friday, but I'm leaving this as a quick note that I plan to work on Dedupe core issues in roughly the following sequence:

Allowing 32-bit floats instead of 64-bit doubles in fastcluster
Improving the connected component search algorithm to make it less memory-intensive
Defining a test harness for testing different performance metrics
Using blocks as a feature for the classifier
Researching different approaches to sampling record pairs for active labelling
Researching different learning routines (connects #55)

datamade / how-to

[ STUB ] Dedupe core R&D #60