The SQL/Ibis powered sklearn of record linkage.
Still in alpha stage. Breaking changes will happen frequently and with no warning. Once things are more stabilized I will come up with a stability policy. Any suggestions as to how you want the API to look like would be greatly appreciated.
I have claimed mismo
on PyPI, but I won't update it often
until this is more stable. Until then, install from source:
python -m pip install "mismo[viz] @ git+https://github.com/NickCrews/mismo@<SOME-SHA-OR-BRANCH>"
Mismo tries to be the sklearn of record linkage, backed by the scalability and power of SQL and Ibis. It is made of many small data structures and functions, each with a well-defined and standard API that allows them to be composed together and extended easily. None of the other record linkage packages I have seen, such as Splink, Dedupe, or Record Linkage Toolkit, had all of these properties, so I decided to make my own.
See Goals and Alternatives for a more detailed discussion of the goals of Mismo and how it compares to other record linkage packages.
See the example notebook.
See the documentation.
See the contributing guide.
mismo
is distributed under the terms of the
LGPL-3.0-or-later license.