IAPark / tiktoken_ruby

Unofficial ruby binding for tiktoken by way of rust
MIT License
109 stars 26 forks source link

Tiktoken.encoding_for_model has bad return value on unrecognized model name #28

Closed rob-mindtrip closed 3 months ago

rob-mindtrip commented 4 months ago

When you pass an unrecognized model name to Tiktoken.encoding_for_model, instead of returning nil it returns an unexpected Hash.

irb(main):001:0> require 'tiktoken_ruby'
=> true
irb(main):002:0> Tiktoken.encoding_for_model 'foo'
=> 
{:"gpt-4-"=>"cl100k_base",
 :"gpt-3.5-turbo-"=>"cl100k_base",
 :"gpt-35-turbo-"=>"cl100k_base",
 :"ft:gpt-4"=>"cl100k_base",
 :"ft:gpt-3.5-turbo"=>"cl100k_base",
 :"ft:davinci-002"=>"cl100k_base",
 :"ft:babbage-002"=>"cl100k_base"}

The bug seems to have been introduced by https://github.com/IAPark/tiktoken_ruby/commit/0c1a45b46af6ed250d2eb82303594dab5dc7d813 which lets Hash#each return self if nothing is found.

ScotterC commented 4 months ago

Good find! What would you expect the response to be? nil probably?

rob-mindtrip commented 4 months ago

Yeah, I think that would make sense.

ScotterC commented 4 months ago

Fantastic. If you have a moment. A failing spec would be a great contribution. The repo is a bit between ownership but this is simple enough that it'd be an easy merge

IAPark commented 3 months ago

Fixed by #29