Open Hime-Hina opened 1 year ago
Thank you for your sincere advice.
Thank you a lot @Hime-Hina! Inspired by your demo, I integrated the latest tiktoken
library with my chatgpt-demo fork, for which you can view a live demo here. I found some conclusions related to token counting.
For conclusion, the pseudo formula can be represented as:
$$ \begin{align} \text{prompt tokens} &= \sum_{\texttt{msg}}\texttt{( encode(msg).length+4 ) + 3}\ \text{completion tokens} &= \texttt{encode(msg).length} \end{align} $$
Specifically, the 3 more tokens for the context are for <|im_start|>
, "assistant"
, \n
, and the 4 more tokens for each message are for <|im_start|>
, role/name
, \n
, <|im_end|>
.
I've compared the token count in the API response's Header with the I calculated myself using Python and JavaScript respectively, and found that there is no issue. (Mention that I interestingly found that in fact the official tokenizer demo is a GPT-3 tokenizer, which encodes Chinese letters much worse than gpt-3.5-turbo's)
OpenAI also has a note of the markup language they created for conversations.
As you said, trying to make WASM work on edge functions is incredibly tough. I almost spent half a day with bugs. In the end, I found that this way works well in self-host
route, which is similar to your solution in the dev
branch of your demo repo. But this don't work on Edge Functions of Vercel or Netlify (yes, serverless functions work, but they can't stream responses). Finally I use fetch to load wasm and use dynamic import to solve this.
You can view my implementation through the following pages:
Clear and concise description of the problem
As the official cookbook How to stream completions cites:
Personally, I think it would be useful to implement that feature. Users wouldn't have to check the daily usage breakdown on their account page, and it would make for a more responsive and user-friendly experience.
Suggested solution
I have actually implemented that feature on the front-end already, using the @dqbd/tiktoken library, which is a third-party TypeScript version of the official tiktoken library. OpenAI also provides an example on how to count tokens with the tiktoken library. For specific implementation, please refer to my repo.
Alternative
Maybe there is a way to implement it on the back-end by providing an API, but I have not succeeded in achieving that so far because it seems impossible to load a wasm file when deploying on Vercel. I have followed the tutorial on Vercel docs and tried some plugins to load the wasm file but failed. If anyone knows about this, please let me know! 😁
Additional context
I have not optimized my code, but it suffices for now. There are some bugs, as shown below:
The first completion is primed with
\n\n
, and 20 tokens are used. After conducting some tests, I have observed that the number of tokens of the completion seems to be equal to the number of tokens of the completion content only, indicating that the special tokens and line breaks are not included in the count (Please refer to the code for more details).The second completion has exactly the same content as the first one, but is not primed with
\n\n
. As\n\n
is encoded in 271, it indicates that one token is used. Therefore, the result is 19, which is exactly what we expected.But the paradox is that
The daily usage gives me both 19. I have no ideas about this, it requires further testing.
If you know about this, please let me know! I would appreciate it.
In addition, I feel that my implementation method is still quite rough and only supports the 'gpt-3.5' model. I have not tested it on other models. Also, if you have any advice, please let me know too.
Validations