J535D165 / recordlinkage

A powerful and modular toolkit for record linkage and duplicate detection in Python
http://recordlinkage.readthedocs.io/
BSD 3-Clause "New" or "Revised" License
966 stars 152 forks source link

Option to return intersection of pairs returned from indexers rather than union #160

Open chriskl opened 3 years ago

chriskl commented 3 years ago

In our application we want to generate pairs where all the following are true:

However if we add all three of these at once, we get the union of them all (and the classification exact means 45M pairs). It would be lovely if there was an intersection flag we could pass in, rather than our current approach of having to generate the three sets of pairs separately and then using MultiIndex.intersect() on them all.