bystrogenomics / bystro

Natural Language Search and Analysis of High Dimensional Genomic Data
Mozilla Public License 2.0
44 stars 14 forks source link

Rust or C++ annotator #79

Open akotlar opened 4 years ago

akotlar commented 4 years ago

I think it is clear that long term a faster language is useful, especially one that allows us more opportunities to integrate as a library with other software. Goal is to (initially) replicate just the annotator. Expose a library, Rust cli tool, Python, and Perl integration. Python and Perl will use existing FFI interfaces, so that someone using Bystro in those languages today could continue to do so.

Long term this version should integrate with existing Python ecosystem, especially popular data formats like Parquet. Should support datasets too large to contain within a single file, aiming for no more than 5TB per chunk (S3 limit)

Let's track development here.

akotlar commented 4 years ago

Instead of a stop-the-world rewrite, I propose we piecemeal replace portions of the existing annotator with Rust, using https://metacpan.org/pod/FFI::Platypus

akotlar commented 4 years ago

Here's a relevant package written in Rust : https://github.com/meilisearch/MeiliSearch/blob/cde884514387e9d8656a95e59d564378fe4d229b/meilisearch-core/src/database.rs (uses LMDB as backing for search engine).

Bystro is somewhat analogous: the genomics search engine backed by LMDB.