Open fhamborg opened 3 months ago
Related to #6593
Thank you @fhamborg for the suggestion to truncate specific parts of the prompt. We are tracking this with #6593
Regarding the error caused by the input length, you could set a maximum length for truncation as part of the generation_kwargs
of the HuggingFaceAPIGenerator
. Does that work for you as a workaround? https://huggingface.co/docs/huggingface_hub/main/en/package_reference/inference_client#huggingface_hub.InferenceClient.text_generation
Thanks @julian-risch for the quick reply! As for setting the truncation
parameter to some value, I guess while it would help to avoid the error above it would cut of the actual question in such cases where the input is too long (as the question is the last item in my prompt), which would be worse.
Is there a way to retrieve the actual input to the LLM (or rather the text that is converted to that input), i.e., the potentially truncated input? This way I could compare my full prompt and the actual one (after potential truncation) and if it in fact was truncated I could rerun the pipeline but with top_k
for the retriever component set one lower, for example. Or would you think it'd be better to just catch the exception above and then rerun with decreased top_k
?
EDIT: I just figured that the top_k
parameter has to be set during creation of the pipeline, not during running it. So the above idea wouldn't work unfortunately (only if I recreated the pipeline each time the situation above occurs). Do you have any idea of how to both avoid the error above and also cutting of the question, other than setting top_k
to a very low value (in which case still at some point, i.e., if the chat history get long, it would come up again)?
@julian-risch
Thank you @fhamborg for the suggestion to truncate specific parts of the prompt. We are tracking this with #6593 Regarding the error caused by the input length, you could set a maximum length for truncation as part of the
generation_kwargs
of theHuggingFaceAPIGenerator
. Does that work for you as a workaround? https://huggingface.co/docs/huggingface_hub/main/en/package_reference/inference_client#huggingface_hub.InferenceClient.text_generation
what's the exact naming of the parameter? the link does not contain a max_length and also not a truncation parameter.
My usecase is slightly different as I'm trying to achieve getting around this bug with HuggingFaceLocalGenerator. Setting max_length (generation_kwargs) here only applies to the output length but won't truncate the input and thus crashes my application.
Describe the bug
When using a standard RAG pipeline I get the above error.
Error message
Expected behavior
My expectation would be that there is a truncation built in that truncates the input so that not too many tokens are passed to the model. Ideally the input should be truncated not at the end of the prompt (in which case the question would be truncated) but at a specific part (e.g., instead of using all tokens of my top_k=10 documents but truncating those).
To Reproduce
FAQ Check
System: