-
https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8421191/ e.g.
We are already kind of doing very basic word2vec with the numpy version of remove_redundancies that Adam wrote; can we use an algorithm t…
-
When the user enters a Property Name, check to see if similar names already exist in the same ecosystem. (Not sure how to define "similar" and detect them, maybe there's a standard text search functio…
-
So I am going to make a larger pull request on this, but I noticed there were some optimization problems with the gamma*() functions.
**Avoidance of factors**
I notice you coerce the inputs into …
-
Regression tests fail with PG12beta1 because floating point output is now more precise by default:
```
16:50:42 --- /tmp/autopkgtest.SxiwfA/tree/expected/test1.out 2019-05-21 14:50:21.000000000 +000…
df7cb updated
5 years ago
-
This library is described as fuzzy string matching with Levenshtein distance. However, it doesn't seem to use Levenshtein at all?
fuzz.ratio("tide", "diet") returns:
- 50 with python-Levenshtein i…
-
I am trialing a RegEx feature for the openSquat.
`git clone https://github.com/atenreiro/regex_opensquat`
1- Make sure to install the requirements.txt
2-Modify keywords.txt
3-In the regex_mult…
-
Add handling for poorly named files:
- [x] Synthesise poorly named files
- [x] Update classifier logic to factor in poorly named files
-
### Is your proposal related to a problem?
I would like to be able to use Splink with embedding-based similarity functions, specifically with duckdb and Athena backends.
For example, to evaluate…
-
The new rust backend appears to lead to a pretty steep performance regression in the hamming implementation:
## Old
![hamming_old](https://user-images.githubusercontent.com/44199644/233227647-b342…
-
Whenever Taiga is unable to identify a title, the list of suggestions is almost always wrong, and also pretty surprising.
An example:
Filename `[SubsPlease] Shin no Nakama - 01 (720p) [C98D24A8].…