-
Let s_vec be some vector of N distinct strings. When N is too large, stringdistmatrix grows unwieldy (NxN), as does the "dist" struct returned by stringdistmatrix when called w a single arg.
Would …
-
When I `pip install ceja`, I automatically get
pyspark-3.1.1.tar.gz (212.3MB)
which is a problem because it's the wrong version (using 3.0.0 on both EMR & WSL).
Even when I eliminate it, I still g…
-
Hi,
We've recently used fuzzy_match and found out that it produces some pretty weird name matches. Instead of matching "art" to "Artem", it chooses "Karl".
Could you please add an option "Match onl…
-
Dear James,
One of the bottlenecks of fuzzy comparison between two large lists of size n and m is that it has O(n*m) runtime as the comparison function -- e.g. jellyfish.jaro_winkler -- has to be c…
-
I think none of us really was aware of https://jsr.io/@std/text/doc/~/closestString and I think we might want to swap out our own implementation for theirs.
Perhaps we should bench it and then cont…
-
As described on the rapidfuzz github ( https://github.com/rapidfuzz/RapidFuzz )
RapidFuzz is a fast string matching library for Python and C++, which is using the string similarity calculations fro…
-
Related issue https://github.com/arduino/Arduino/issues/6646
-
Hi,
I am trying to perform deduplication on a database with 1.8M records. The analysis has been running for ~10 days on a 8-core machine with 32Gb RAM. Do you believe this task can be achieved on s…
-
"make" produces a *lot* of warnings. In there somewhere there is probably an error, since I also see this:
mv -f .deps/build_trans_table-transliteration_table_builder.Tpo .deps/build_trans_table-t…
-
I've been trying to get this to install on Windows using MSys2 for 2 days. I'm a noob at this stuff so please bear with me.
I'm using the **MSYS2 MinGW 64-bit terminal**.
_______________________…