Anush008 / fastembed-rs

Library for generating vector embeddings, reranking in Rust
https://docs.rs/fastembed
Apache License 2.0
264 stars 36 forks source link

feat: Allow tokenizer to be used to count tokens and limit tokens #10

Closed prattcmp closed 9 months ago

prattcmp commented 9 months ago

Right now the tokenizer is private, so it can't be used to count the number of tokens in a passage or to chunk passages to a specific token limit.

This PR moves towards resolving that.

Anush008 commented 9 months ago

Thank you.

github-actions[bot] commented 9 months ago

:tada: This PR is included in version 1.10.0 :tada:

The release is available on:

Your semantic-release bot :package::rocket: