caikit / caikit-nlp

Apache License 2.0
12 stars 46 forks source link

Text Embedding: Fix concurrency errors #291

Closed markstur closed 9 months ago

markstur commented 9 months ago

The use of tokenizer for truncation before using the sentence-transformers model to encode hits "Already borrowed" errors because the fast tokenizer (Rust) isn't very Python thread-friendly.

Note: The truncation code requires a fast tokenizer (no change with this PR, but to-do for future)