Closed chansooligans closed 2 years ago
but also -- just as importantly, re-write dedupe.learner.best_schemes and dedupe.learner.get_comparisons
the goal is for the n_pairs column in dedupe.learner.df_conjunctions to be distinct cumulative to see how each new scheme adds new pairs
first, modify forward index builder so that there is an option to add a column at a time to blocks_train / blocks_df
currently it just drops and creates
but should:
add indices to table instead of rebuilding each time