ceifa / tiktoken-node

OpenAI's tiktoken but with node bindings
103 stars 10 forks source link

gpt-3.5-turbo-0301 wrong token count #7

Closed loretoparisi closed 1 year ago

loretoparisi commented 1 year ago

Doing

    const tiktoken = require('tiktoken-node')
    const encoding = tiktoken.getEncoding("cl100k_base")
    const str = "Correct the spelling and grammar\n\nShe no went to the market."
    const encoded = encoding.encode(str);
    for (let token of encoded) {
        console.log({ token, string: encoding.decode([token]) })
    }

I get 14 tokens:

 token: 42779, string: 'Correct' }
{ token: 262, string: ' the' }
{ token: 24993, string: ' spelling' }
{ token: 290, string: ' and' }
{ token: 23491, string: ' grammar' }
{ token: 198, string: '\n' }
{ token: 198, string: '\n' }
{ token: 3347, string: 'She' }
{ token: 645, string: ' no' }
{ token: 1816, string: ' went' }
{ token: 284, string: ' to' }
{ token: 262, string: ' the' }
{ token: 1910, string: ' market' }
{ token: 13, string: '.' }

with in the API usingtext-davinci-003 gives me 14 prompt_tokens and it is correct, while using in the API gpt-3.5-turbo-0301 gives me 21 prompt_tokens. How to match that in the module? If I try

const encoding = tiktoken.encodingForModel("gpt-3.5-turbo")

it is still returning 14 tokens.

ceifa commented 1 year ago

I'm a little bit confused. Which model are you using, how much token did you expect and how much tokens did you get?

loretoparisi commented 1 year ago

So when I'm using via API "gpt-3.5-turbo-0301", I get:

"usage": {
    "prompt_tokens": 21,
    "completion_tokens": 8,
    "total_tokens": 29
  },

while using via API "text-davinci-003" I get

"usage": {
    "prompt_tokens": 14,
    "completion_tokens": 10,
    "total_tokens": 24
  }

for the input text "Correct the spelling and grammar\n\nShe no went to the market." But I cannot get this result from the module using the encoding "cl100k_base" that should be the one used in gpt-3.5-turbo-0301

ceifa commented 1 year ago

ChatGPT models consumes tokens in a different way. For reference: https://github.com/openai/openai-python/blob/main/chatml.md https://community.openai.com/t/counting-tokens-for-chat-api-calls-gpt-3-5-turbo/81974

loretoparisi commented 1 year ago

ChatGPT models consumes tokens in a different way. For reference: https://github.com/openai/openai-python/blob/main/chatml.md https://community.openai.com/t/counting-tokens-for-chat-api-calls-gpt-3-5-turbo/81974

Okay so the method in JavaScript should be

/**
     * Returns the number of tokens used by a list of messages.
     * @param {*} messages 
     * @param {*} model 
     * @returns 
     */
    function numTokensFromMessages(messages, model = "gpt-3.5-turbo-0301") {
        var encoding;
        try {
            encoding = tiktoken.encodingForModel(model)
        } catch (KeyError) {
            console.log("Warning: model not found. Using cl100k_base encoding.")
            encoding = tiktoken.getEncoding("cl100k_base")
        }
        if (model == "gpt-3.5-turbo") {
            console.log("Warning: gpt-3.5-turbo may change over time. Returning num tokens assuming gpt-3.5-turbo-0301.")
            return numTokensFromMessages(messages, model = "gpt-3.5-turbo-0301")
        } else if (model == "gpt-4") {
            console.log("Warning: gpt-4 may change over time. Returning num tokens assuming gpt-4-0314.")
            return numTokensFromMessages(messages, model = "gpt-4-0314")
        } else if (model == "gpt-3.5-turbo-0301") {
            tokens_per_message = 4  // every message follows <|start|>{role/name}\n{content}<|end|>\n
            tokens_per_name = -1  // if there's a name, the role is omitted
        } else if (model == "gpt-4-0314") {
            tokens_per_message = 3
            tokens_per_name = 1
        } else {
            throw new Error(`num_tokens_from_messages() is not implemented for model {model}. See https://github.com/openai/openai-python/blob/main/chatml.md for information on how messages are converted to tokens.`)
        }
        var num_tokens = 0
        for (const message of messages) {
            num_tokens += tokens_per_message
            Object.keys(message).forEach(key => {
                var value = message[key];
                var encoded = encoding.encode(value);
                num_tokens += encoded.length
                if (key == "name") {
                    num_tokens += tokens_per_name
                }
            });
        }
        num_tokens += 3  // every reply is primed with <| start |> assistant <| message |>
        return num_tokens
    }//numTokensFromMessages

Despited that I'm still getting from the api call 21 tokens

"usage": {
    "prompt_tokens": 21,
    "completion_tokens": 8,
    "total_tokens": 29
  },

while calling


var str = "Correct the spelling and grammar\n\nShe no went to the market."
    const messages = [{
        role: "user",
        name: "",
        content: str
    }];
const num_tokens = numTokensFromMessages(messages, "gpt-3.5-turbo")
Warning: gpt-3.5-turbo may change over time. Returning num tokens assuming gpt-3.5-turbo-0301.
// num_tokens=20
Felizolinha commented 1 year ago

ChatGPT models consumes tokens in a different way. For reference: https://github.com/openai/openai-python/blob/main/chatml.md https://community.openai.com/t/counting-tokens-for-chat-api-calls-gpt-3-5-turbo/81974

Okay so the method in JavaScript should be

/**
     * Returns the number of tokens used by a list of messages.
     * @param {*} messages 
     * @param {*} model 
     * @returns 
     */
    function numTokensFromMessages(messages, model = "gpt-3.5-turbo-0301") {
        var encoding;
        try {
            encoding = tiktoken.encodingForModel(model)
        } catch (KeyError) {
            console.log("Warning: model not found. Using cl100k_base encoding.")
            encoding = tiktoken.getEncoding("cl100k_base")
        }
        if (model == "gpt-3.5-turbo") {
            console.log("Warning: gpt-3.5-turbo may change over time. Returning num tokens assuming gpt-3.5-turbo-0301.")
            return numTokensFromMessages(messages, model = "gpt-3.5-turbo-0301")
        } else if (model == "gpt-4") {
            console.log("Warning: gpt-4 may change over time. Returning num tokens assuming gpt-4-0314.")
            return numTokensFromMessages(messages, model = "gpt-4-0314")
        } else if (model == "gpt-3.5-turbo-0301") {
            tokens_per_message = 4  // every message follows <|start|>{role/name}\n{content}<|end|>\n
            tokens_per_name = -1  // if there's a name, the role is omitted
        } else if (model == "gpt-4-0314") {
            tokens_per_message = 3
            tokens_per_name = 1
        } else {
            throw new Error(`num_tokens_from_messages() is not implemented for model {model}. See https://github.com/openai/openai-python/blob/main/chatml.md for information on how messages are converted to tokens.`)
        }
        var num_tokens = 0
        for (const message of messages) {
            num_tokens += tokens_per_message
            Object.keys(message).forEach(key => {
                var value = message[key];
                var encoded = encoding.encode(value);
                num_tokens += encoded.length
                if (key == "name") {
                    num_tokens += tokens_per_name
                }
            });
        }
        num_tokens += 3  // every reply is primed with <| start |> assistant <| message |>
        return num_tokens
    }//numTokensFromMessages

Despited that I'm still getting from the api call 21 tokens

"usage": {
    "prompt_tokens": 21,
    "completion_tokens": 8,
    "total_tokens": 29
  },

while calling

var str = "Correct the spelling and grammar\n\nShe no went to the market."
    const messages = [{
        role: "user",
        name: "",
        content: str
    }];
const num_tokens = numTokensFromMessages(messages, "gpt-3.5-turbo")
Warning: gpt-3.5-turbo may change over time. Returning num tokens assuming gpt-3.5-turbo-0301.
// num_tokens=20

This code returns the correct values for the number of tokens in my prompts. For the answers, I had to remove 8 tokens from the count for it to be correct, but I'm not sure why.

Here's the updated code (in TypeScript, but with a few ts-ignores):


// @ts-ignore
function numTokensFromMessages(messages: { role: string; name?: string; content: string }[], isReply, model = "gpt-3.5-turbo-0301") {
    var encoding: tiktoken.Encoding;
    try {
        encoding = tiktoken.encodingForModel(model)
    } catch (KeyError) {
        console.log("Warning: model not found. Using cl100k_base encoding.")
        encoding = tiktoken.getEncoding("cl100k_base")
    }
    let tokens_per_message = 0;
    let tokens_per_name = 0;
    if (model == "gpt-3.5-turbo") {
        console.log("Warning: gpt-3.5-turbo may change over time. Returning num tokens assuming gpt-3.5-turbo-0301.")
        return numTokensFromMessages(messages, model = "gpt-3.5-turbo-0301")
    } else if (model == "gpt-4") {
        console.log("Warning: gpt-4 may change over time. Returning num tokens assuming gpt-4-0314.")
        return numTokensFromMessages(messages, model = "gpt-4-0314")
    } else if (model == "gpt-3.5-turbo-0301") {
        tokens_per_message = 4  // every message follows <|start|>{role/name}\n{content}<|end|>\n
        tokens_per_name = -1  // if there's a name, the role is omitted
    } else if (model == "gpt-4-0314") {
        tokens_per_message = 3
        tokens_per_name = 1
    } else {
        throw new Error(`num_tokens_from_messages() is not implemented for model {model}. See https://github.com/openai/openai-python/blob/main/chatml.md for information on how messages are converted to tokens.`)
    }
    let num_tokens = 0
    for (const message of messages) {
        num_tokens += tokens_per_message
        Object.keys(message).forEach(key => {
            // @ts-ignore
            let value = message[key];
            let encoded = encoding.encode(value);
            num_tokens += encoded.length
            if (key == "name") {
                num_tokens += tokens_per_name
            }
        });
    }
    num_tokens += 3  // every reply is primed with <| start |> assistant <| message |>
    return num_tokens + (isReply ? -8 : 0)
}