-
Right now the match methods of `Dedupe` and `RecordLink` return clusters like
```python
[(tuple_of_ids in a cluster, tuple_of_corresponding_scores),
...]
```
But the Gazetteer class's match m…
-
See: http://localhost:8787/exist/apps/srophe/spear/aggregate.html?id=http://syriaca.org/person/2032
Some names are not showing up.
-
Hello.
Can't get why gazetteer doesn't match single name 'Barack'?
```python
import spacy, re
from skweak import heuristics, gazetteers, aggregation, utils, base
nlp = spacy.load("en_core_web_…
-
Hello, When i am running the minimal example :
```python
import spacy, re
from skweak import heuristics, gazetteers, generative, utils
# LF 1: heuristic to detect occurrences of MONEY entities…
-
I have a sample where often there are several very similar candidates in the clean Gazetteer dataset (As a consequence of multiple addresses/names as in #398).
Now, I am wondering, if I encounter the…
-
I get a poor recall rate when running a Gazetteer example which I've created based on pgsql_big_dedupe_example.
The code for reproducing the problem is attached. Please see the "instructions.txt" fil…
-
Currently the source field is a free form string field with no controls.
@geoffj-FUG has identified a requirement to control limit the content to a set of predefined values that are established by the…
-
### Your to-do
Document the below decisions and structure.
What should the API structure look like? E.g.:
1. Resource first
* _/path-to-app/hammersmith/map_
* _/path-to-app/hammersm…
-
We have multiple records in messy data(multiple sets of duplicate records) and a corresponding canonical set. When we execute gazetteer process, output file has only 1 record mapping to canonical reco…
-
Now, the indices that are used in blocking can be saved and restored. This only really seems useful for gazetteer classes.
With the file-backed blocking introduced in 1.6.8, this becomes a lot mor…