-
Hi @ruanchaves, I finally found a free slot to go through the tutorial. Here's a detailed review:
1. I'd try to streamline the preprocessing part. Maybe create a preprocessed dataset and move the p…
-
Currently calling `match` against `RecordLink` and `Gazetteer` classes with one candidate only returns the top match. Lets make it return up to 10.
-
Gazetteer unofficial entry "Washington, DC" and other forms are flagged as non-sense by punct filter.
test / solution: if both match and test entry have ", " and are near exact matches, then the …
-
我看其他语种绝大多数都是遥遥领先第二名,只是在中文上比起最好的模型要差不少,请问有分析是什么原因吗?
-
Hi,
I have deduped list and want to run dedupe on a new file and append to the cluster I already created without changing the clusterId.
Is there a way to do that? I do not want to run the dedupe …
-
## Info about spaCy
- **spaCy version:** 3.2.4
- **Platform:** Linux-5.16.15-76051615-generic-x86_64-with-glibc2.34
- **Python version:** 3.9.7
- **Pipelines:** en_core_web_md (3.2.0), en_core_w…
-
https://github.com/dedupeio/dedupe/blob/65252112844f9951c33a67bcc10a20a6617b160e/dedupe/core.py#L316
Why are the smaller_ids information (on this line, the third parameter of the tuples) simply ign…
-
_(Whoops—originally posted this as dedupeio/dedupe-examples#101, whereas it is probably better for this repo. Moving it over here instead.)_
Thanks for this awesome project! We're really exci…
-
**Describe the bug**
Obscure references to AIRF or AIRP in general geotagging. Should add "transportation" gazetteer feature so that specifically alphanumeric codes might be allowable in general g…
-