ashvardanian / StringZilla

Up to 10x faster strings for C, C++, Python, Rust, and Swift, leveraging NEON, AVX2, AVX-512, and SWAR to accelerate search, sort, edit distances, alignment scores, etc 🦖
https://ashvardanian.com/posts/stringzilla/
Apache License 2.0
2.05k stars 66 forks source link

Aggregate a plain non-synthetic dataset for Bio sequences #91

Closed ashvardanian closed 1 month ago

ashvardanian commented 6 months ago

For fair benchmarks of Needleman-Wunsch scoring algorithms we should find a real-world protein bank and ideally export it into a whitespace or newline delimited .txt file, that will be easy to parse not only in Python, but also in C++. Community contributions more than welcome 🤗