-
### Is your proposal related to a problem?
I need to find similarities in company names. Often, words are added ("limited" etc) or removed (e.g. First name in personal companies). This can significan…
-
How does vector embedding perform compared to fuzzy string matching?
I’m exploring the usage of string matching like levenshtein distance, Jaro-Winkler among others.
Vector embedding seems similar i…
-
- Train w2v on raw/clean data
- Train doc2v on raw/clean data
-
```
import pandas as pd
import numpy as np
import recordlinkage
df = pd.DataFrame(
{"col1": np.nan,
"col2": np.nan,
"col3": "block_string",
},
index = np.arange(10)
…
-
This gem is a dependency of `rubocop`. My development machine is a Mac; my CI machine is Linux. What would be the best practice regarding the gem for these different environments?
For example, if I…
-
Now that the package is getting more mature, it would be nice to add support for other distance metrics (specifically, hamming and cosine distances). These should be relatively easy to implement follo…
-
Assuming I have `~/Documents` and the teleport `home:~` then `goto hom/Diocuemants` should take me there. Perhaps hide it behind some configuration?
For example
```
$ goto --config set AllowFuzzy…
-
Dear James,
One of the bottlenecks of fuzzy comparison between two large lists of size n and m is that it has O(n*m) runtime as the comparison function -- e.g. jellyfish.jaro_winkler -- has to be c…
-
So I am going to make a larger pull request on this, but I noticed there were some optimization problems with the gamma*() functions.
**Avoidance of factors**
I notice you coerce the inputs into …
-
I found this excellent gem! 💎
It seems like there are a few ruby gems for string matching:
- https://github.com/kiyoka/fuzzy-string-match
- https://github.com/flori/amatch
- https://github.c…