Open fgregg opened 2 years ago
Splink uses something very similar to method 2. See https://youtu.be/msz3T741KQI?t=2035 for a nice way of how they think about the different "types" of comparisons that can happen. The whole video had some other great thoughts and visualizations in there too I thought.
Right now, blocking and scoring are two distinct phases.
All the information about how two records came to be blocked together is unused by the scorer. This is a bit silly, as the fact that two records are blocked together by multiple predicates could be a pretty good indicator of co-reference.
I'm not really clear what the best way to take advantage of blocking information in scoring is though.
a few ideas:
In both cases, i'm not quite sure how to set up the training.