deepset-ai / haystack

:mag: LLM orchestration framework to build customizable, production-ready LLM applications. Connect components (models, vector DBs, file converters) to pipelines or agents that can interact with your data. With advanced retrieval methods, it's best suited for building RAG, question answering, semantic search or conversational agent chatbots.
https://haystack.deepset.ai
Apache License 2.0
14.55k stars 1.71k forks source link

Homogenize Generator meta output #7687

Open nachollorca opened 1 month ago

nachollorca commented 1 month ago

Is your feature request related to a problem? Please describe. The dictionaries returned by generators have matching keys to an extent, but some are using different notations for the same concept, mainly within the metadata / usage keys. For example:

Describe the solution you'd like I would like generators to output the same scheme and keys to the extent that it is possible, i.e. for matching concepts. In particular, I would prefer to use input_tokens/output_tokens and stop_reason.

Describe alternatives you've considered So far, in apps where we swap generators depending on the use case, the alternative is:

meta = pipe_result["answer_generator"]["replies"][0].meta
usage = meta["usage"]

output.input_tokens = usage["prompt_tokens"] if "prompt_tokens" in usage else usage["input_tokens"]
output.output_tokens = usage["completion_tokens"] if "completion_tokens" in usage else usage["output_tokens"]
output.finish_reason = meta["finish_reason"] if "finish_reason" in meta else meta["stop_reason"]`
vblagoje commented 1 month ago

Taking into account tendency for other LLM providers/libraries to converge on OpenAI API - let's got with OpenAI naming scheme.

nachollorca commented 1 month ago

Alright, I'll go for OpenAI's convention :)

CarlosFerLo commented 1 month ago

Been having a look at all generators and all the keywords seem to keep the OpenAI standard.

mrm1001 commented 1 week ago

Hi @nachollorca I am the product manager of Haystack and I like to reach out to users to better understand their use cases and see how we can offer better support. Would you have some 15 min in the next few weeks for a quick chat? https://calendly.com/maria-mestre-ugu/haystack-catch-up

nachollorca commented 1 week ago

@CarlosFerLo

Been having a look at all generators and all the keywords seem to keep the OpenAI standard.

At least Cohere, Anthropic and AmazonBedrock use 'meta': {'model': 'claude-2.1', 'index': 0, 'finish_reason': 'end_turn', 'usage': {'input_tokens': 18, 'output_tokens': 58} instead of prompt_tokens, completion_tokens

CarlosFerLo commented 1 week ago

@nachollorca Hey these Generators are in haystack-extensions I believe, maybe we should move this issue there.

vblagoje commented 1 week ago

Hey @nachollorca @CarlosFerLo - most of the generators are in https://github.com/deepset-ai/haystack-core-integrations/ GitHub project