-
### What happens?
Hello!
Splink comparison viewer barplot and waterfall chart don't seem to agree on the match probability. However, they agree on the match weight, which doesn't seem normal. The pr…
-
The test `tests/test_realtime.py::test_realtime_cache_different_settings[duckdb]` failed in python 3.9 [on this run](https://github.com/ADBond/splink/actions/runs/11932686224/job/33258155342?pr=41). R…
-
### What happens?
When working on cluster which are in shared mode on unity catalog, splink throws py security exceptions
### To Reproduce
![Image](https://github.com/user-attachments/assets/df41e0…
-
### Is your proposal related to a problem?
`pseudopeople` has the install requirement `"numpy=2.0.0",`
### Describe alternatives you've considered
An alternative is to keep the current versi…
-
This fails in Spark:
```
r1 = {
"first_name": "John",
"surname": "Smith",
"dob": "1980-01-01",
}
r2 = {
"first_name": "John",
"surname": "Smith",
"dob": None,
}
…
-
### What happens?
Using Athena as the SQL backend (`AthenaLinker`), I am attempting to call `invalidate_cache()` to force the latest input data to be used. However, this results in the following erro…
-
```
import duckdb
from splink import DuckDBAPI, Linker, SettingsCreator, splink_datasets
con = duckdb.connect()
db_api = DuckDBAPI(connection=con)
df = splink_datasets.fake_1000
settings…
-
**Describe the bug**
Ran into out of range error when using tool with this CSV input:
```
"id","truth_value","family_name","given_name","gender","birth_date","phone","street_address","city","state"…
-
### Is your proposal related to a problem?
We've been working with splink to do a bit of address matching, and have identified cases where:
1. We can't use Levenshtein distance because it wouldn't…
-
After looking over its documentation in more detail, I've seen that splink has some nice utilities for [visualising string comparison metrics](https://moj-analytical-services.github.io/splink/topic_gu…