IBM / ibm-generative-ai

IBM-Generative-AI is a Python library built on IBM's large language model REST interface to seamlessly integrate and extend this service in Python programs.
https://ibm.github.io/ibm-generative-ai/
Apache License 2.0
250 stars 101 forks source link

Langchain extension token_usage structure doesn't match convention #217

Closed ind1go closed 1 year ago

ind1go commented 1 year ago

Version Information

What is the expected behavior?

There's a convention that LangChain generation returns token_usage info, and that dict contains keys like:

What is the actual behavior?

LangChainInterface returns the token_usage but it has different keys. This makes it trickier to work with other tools and frameworks that rely on the conventional structure.

Please provide a unit test that demonstrates the bug.

# First run generation, returning llm_output: LLMOutput..
if "token_usage" in llm_output:
    token_usage = result.llm_output["token_usage"]
    print(token_usage["prompt_tokens"]) # Bang!
    print(token_usage["completion_tokens"]) # Bang!
    print(token_usage["total_tokens"]) # Bang!

Other notes on how to reproduce the issue?

Any possible solutions?

Presumably it would be best to retain the existing keys and duplicate the values to new keys.

Can you identify the location in the GENAI source code where the problem exists?

Perhaps around https://github.com/IBM/ibm-generative-ai/blob/54c81292211632d7d4cab0af5415f5f89b0b1a67/src/genai/extensions/langchain/llm.py#L90-L93

If the bug is confirmed, would you be willing to submit a PR?

Yes

Tomas2D commented 1 year ago

Hey Ben,

Thanks for bringing this up. This is something that we should definitely align. I can look at it during the upcoming week, but if you want do it by yourself, feel free to open PR. Just let me know.