IAPark / tiktoken_ruby

Unofficial ruby binding for tiktoken by way of rust
MIT License
109 stars 26 forks source link

Thread safe? #8

Closed mkdynamic closed 3 months ago

mkdynamic commented 1 year ago

Is it possible that Tiktoken.encoding_for_model is not thread safe?

IAPark commented 9 months ago

I'm not really sure how I would check for that. I think it should be? No unsafe rust features were used and I'm aware of anything wrong. Did you see something specific

mkdynamic commented 9 months ago

I haven’t dug in deeply, but in a fairly heavily multi threaded app calling that method a lot results in some kind of deadlock for me (verified this with gdb, thread was stuck on that call).

My fix was to synchronize that call in a Mutex. And this solves the issue.

IAPark commented 9 months ago

Interesting, this is a very low priority for me, but maybe I'll try digging into it one of these weekends

viamin commented 4 months ago

It looks like it's this usage of memoization inside a class method in lib/tiktoken_ruby/encoding.rb:

  def self.for_name_cached(encoding)
    @encodings ||= {}
    @encodings[encoding.to_sym] ||= Tiktoken::Encoding.for_name(encoding)
  end

Memoization inside a class method is not thread-safe. The easy fix might be to use a thread-safe memoization library like memo_wise

bdegomme commented 3 months ago

Hi @IAPark, would you consider adding a warning in the readme saying that the library is not thread safe? This leads to tricky deadlocks (I had the same as this one https://github.com/sidekiq/sidekiq/issues/6084, took my 2-3 days to understand). Unfortunately I don't think I could fix this myself :grimacing: Thank you for the great gem!

mkdynamic commented 3 months ago

FWIW I think the fix is fairly simple, if the memoization is indeed the source of the thread safety issue.

Untested patch if it helps: https://github.com/IAPark/tiktoken_ruby/pull/30

IAPark commented 3 months ago

This should now be fixed in the latest version