Closed mkdynamic closed 3 months ago
I'm not really sure how I would check for that. I think it should be? No unsafe rust features were used and I'm aware of anything wrong. Did you see something specific
I haven’t dug in deeply, but in a fairly heavily multi threaded app calling that method a lot results in some kind of deadlock for me (verified this with gdb, thread was stuck on that call).
My fix was to synchronize that call in a Mutex. And this solves the issue.
Interesting, this is a very low priority for me, but maybe I'll try digging into it one of these weekends
It looks like it's this usage of memoization inside a class method in lib/tiktoken_ruby/encoding.rb
:
def self.for_name_cached(encoding)
@encodings ||= {}
@encodings[encoding.to_sym] ||= Tiktoken::Encoding.for_name(encoding)
end
Memoization inside a class method is not thread-safe. The easy fix might be to use a thread-safe memoization library like memo_wise
Hi @IAPark, would you consider adding a warning in the readme saying that the library is not thread safe? This leads to tricky deadlocks (I had the same as this one https://github.com/sidekiq/sidekiq/issues/6084, took my 2-3 days to understand). Unfortunately I don't think I could fix this myself :grimacing: Thank you for the great gem!
FWIW I think the fix is fairly simple, if the memoization is indeed the source of the thread safety issue.
Untested patch if it helps: https://github.com/IAPark/tiktoken_ruby/pull/30
This should now be fixed in the latest version
Is it possible that
Tiktoken.encoding_for_model
is not thread safe?