MrPowers / ceja

PySpark phonetic and string matching algorithms
MIT License
35 stars 5 forks source link

Accelerating string operations #6

Open ashvardanian opened 9 months ago

ashvardanian commented 9 months ago

I've noticed that ceja relies on jellyfish for Levenshtein distance computations, which opens an optimization opportunity. StringZilla should be a few times faster for that operation, and may be very handy for other tasks as well 🤗

MrPowers commented 9 months ago

Thanks! Feel free to create a PR! Perhaps you could add some benchmarks as well!