-
this is useful for two reasons:
- it allows seeding non-exact matches, which in turn allows capturing potentially interesting matches where most (or none!) of the match is exact
- it helps us solve …
-
I don't really want people to have to worry about refdeses inside the code. Part initializations should not be littered with refdes="Uwhatever".
Right now Context.autoname, beyond numbering parts a…
-
**Is your feature request related to a problem? Please describe.**
It is nice to be able to use `block_by` to filter out some comparisons before computing the string similarity. Currently, it is limi…
-
https://arxiv.org/abs/2001.04451
-
[https://becominghuman.ai/extract-a-feature-vector-for-any-image-with-pytorch-9717561d1d4c](https://becominghuman.ai/extract-a-feature-vector-for-any-image-with-pytorch-9717561d1d4c)
-
When detecting renames between `m` and `n` files, as well as when it detects similarities in `git range-diff` between `m` and `n` commits, Git currently performs `m` times `n` comparisons, which is qu…
dscho updated
4 years ago
-
Hope we get this part done by Dec 2 midnight
1. Three components (dict, signature and cluster need to be able to save and load)
- Shingling & dictionary
- Minhash
- LSH - banding
2. Refere…
-
We can split ANN algorithms into three distinct categories; trees, hashes, and graphs.
The following represent possible algorithmic approaches. For each approach there are typically variants. Note …
-
In recent runs against the latest NCBI dataset of Listeria, we've observed large discrepancies between RabbitTClust and NCBI clustering results. Here're a few examples.
1. When distance threshold < …
-
There's a list here: http://ntz-develop.blogspot.co.uk/2011/03/fuzzy-string-search.html