-
Got a report that after `pip install gatenlp[formats,stanza,spacy,nltk,gazeteers,notebook]` recordclass was not installed, although it is listed in the `gazetteers` extra in setup.py.
Need to inv…
-
@NickCrews if a build has a significant performance decline then asv exits with a non-zero code. this is somehow preventing the benchmark output to be set as an env variable in the workflow . i tried …
-
Currently, the maps shown on place and location pages display convex hulls around polyline and polygon geometries in locations, rather than the complete geometries themselves. These convex hulls are p…
-
Hi @ruanchaves, I finally found a free slot to go through the tutorial. Here's a detailed review:
1. I'd try to streamline the preprocessing part. Maybe create a preprocessed dataset and move the p…
-
我看其他语种绝大多数都是遥遥领先第二名,只是在中文上比起最好的模型要差不少,请问有分析是什么原因吗?
-
Gazetteer unofficial entry "Washington, DC" and other forms are flagged as non-sense by punct filter.
test / solution: if both match and test entry have ", " and are near exact matches, then the …
-
Hi,
I have deduped list and want to run dedupe on a new file and append to the cluster I already created without changing the clusterId.
Is there a way to do that? I do not want to run the dedupe …
-
_(Whoops—originally posted this as dedupeio/dedupe-examples#101, whereas it is probably better for this repo. Moving it over here instead.)_
Thanks for this awesome project! We're really exci…
-
https://github.com/dedupeio/dedupe/blob/65252112844f9951c33a67bcc10a20a6617b160e/dedupe/core.py#L316
Why are the smaller_ids information (on this line, the third parameter of the tuples) simply ign…
-
When using a Gazetteer instance to find matches in a dataset that has a model that uses a field that requires a corpus, if the value being searched for is not already in the corpus, Dedupe will return…