Developed for Use by NY Office of the Attorney General: A Python library for scalable entity resolution, using active learning to learn blocking configurations, generate comparison pairs, then clasify matches
one is to sample conjunctions; build a correlation matrix for individual schemes (or pairs of schemes) then use this + sampling to identify best conjunctions of length k (e.g. k=4)
couple issues remain:
ideas: