Generate API vs Chat API billing question

eugene-graft commented 2 months ago

Hi Cohere team. Since the Generate API is deprecated I'm considering migration to a new Chat API. While testing a new API I found a significant difference for billed input tokens between the old and new APIs. I have not found anything relevant in the docs.

Consider an example below: A prompt (message) is 6 tokens long. The Generate API bills for 6 input tokens. Could you please explain why a new Chat API bills for 57 tokens given the input is the same?

cohere SDK version: 5.5.5

cl = cohere.Client("")
model = "command"
message = "hello, how are you?"

len(cl.tokenize(text=message, model=model).tokens)
6

len(cl.tokenize(text=message, model=model, offline=True).tokens)
6

cl.generate(prompt=message, model=model).meta.billed_units.input_tokens
6

cl.chat(message=message, model=model).meta.billed_units.input_tokens
57

billytrend-cohere commented 2 months ago

Hi @eugene-graft, please set preamble="" on cl.chat eg

cl.chat(message=message, model=model, preamble="").meta.billed_units.input_tokens

billytrend-cohere commented 2 months ago

alternatively, you can use command-r command-r-plus. These models don't charge for the preamble.

eugene-graft commented 2 months ago

This works, thank you

cohere-ai / cohere-python

Generate API vs Chat API billing question #522