chansooligans / oagdedupe

Developed for Use by NY Office of the Attorney General: A Python library for scalable entity resolution, using active learning to learn blocking configurations, generate comparison pairs, then clasify matches
https://oagdedupe.readthedocs.io/en/latest/
MIT License
2 stars 1 forks source link

iteratively evaluate conjunction to get full_comparisons #33

Closed chansooligans closed 2 years ago

chansooligans commented 2 years ago

add indices to table instead of rebuilding each time

chansooligans commented 2 years ago

but also -- just as importantly, re-write dedupe.learner.best_schemes and dedupe.learner.get_comparisons

the goal is for the n_pairs column in dedupe.learner.df_conjunctions to be distinct cumulative to see how each new scheme adds new pairs

chansooligans commented 2 years ago

first, modify forward index builder so that there is an option to add a column at a time to blocks_train / blocks_df

currently it just drops and creates

but should:

  1. create table if it does not exist
  2. create column if it does not exist