The use of tokenizer for truncation before using the sentence-transformers model to encode hits "Already borrowed" errors because the fast tokenizer (Rust) isn't very Python thread-friendly.
Add retry to fix the problem and future problems
A simple retry loop usually fixes the problem immediately
Use a little sleep that increases
Add RETRY_COUNT so the max can be changed if needed
Use a separate tokenizer instance for truncate
Using a separate copy of the tokenizer avoids the concurrency problem as reproduced so far
Don't use decode()
Less use of tokenizer should help with above issues
This makes the truncated string more exactly as it should be. With decode() is was an inexact reproduction inconsistent with non-truncated text handling.
Add BATCH_SIZE env var to allow setting the sentence-transformers batch_size to use for performance testing/tuning
More tests
Use a loaded model in tests instead of a bootstrapped model
Note: The truncation code requires a fast tokenizer (no change with this PR, but to-do for future)
The use of tokenizer for truncation before using the sentence-transformers model to encode hits "Already borrowed" errors because the fast tokenizer (Rust) isn't very Python thread-friendly.
Add retry to fix the problem and future problems
Use a separate tokenizer instance for truncate
Don't use decode()
Add BATCH_SIZE env var to allow setting the sentence-transformers batch_size to use for performance testing/tuning
More tests
Use a loaded model in tests instead of a bootstrapped model
Note: The truncation code requires a fast tokenizer (no change with this PR, but to-do for future)