Closed lefnire closed 4 months ago
This would also be nice for other nodes, i personally wanted to play around a bit with 8-bit quantization for seq2seq models via bitsandbytes and its kind of frustrating to create a custom node, only to pass a simple boolean flag into the pipelines constructor.
I'll like to work on this, can someone further explain what lefnire means by args to predict
To allow the Haystack node to pass kwargs to the generate
method of HF pipelines kwargs,
we can follow the implementation of generation_kwargs
in Transformers ImageToText node.
Had a look at the ImageToText
node and it seams to handle the generation_kwargs
exactly as i would want it in other nodes.
I also played around a bit with the HFLocalInvocationLayer of the PromptNode
and noticed that it filters the kwargs before passing them into the pipeline.
It's kind of confusing that different text generation nodes behave differently. Is there a possibility to enable passing a GenerationConfig instance into each generation-call that supports it? This could maybe streamline the behavior of the generation nodes. 🤔
Yeah, I experimented with generate_kwargs now that I am trying to find HF model that can run an Agent. Still no luck, but no reason not to add support 'generate_kwargs' to model parameters. We filter because we want to have a great user experience (easily create any PromptNode) and run various models underneath PromptNode. I see no other way to enable all models to pass all their parameters via PromptNode and make all of them work seamlessly. Any other approach but to filter in PromptNodeInvocationLayer?
@vblagoje This actually makes a lot of sense, and i get why its important to provide a good user experience. But i guess the parameter filtering is already handled by the transformers library when using a GenerationConfig instead of passing the generation parameters manually.
At least as of transformers version 4.26 every generate method should support the GenerationConfig. And as stated in the documentation every generation model should support it ("A generate call supports the following generation methods for text-decoder, text-to-text, speech-to-text, and vision-to-text models").
And another of topic question: Is there a possibility to expand the PromptNodeInvocationLayer to also support PEFT finetuned models? Many of the instruction finetuned models only provide LoRA weights which need to be loaded with an additional base model.
I'm running into this when trying to use databricks/dolly-v2-12b
as part of a PromptNode.
This model, by default, requires trust_remote_code=True
.
What is the recommended approach to use such a model as part of PromptNode?
>>> from haystack.nodes import PromptNode
>>> prompt_node = PromptNode(model_name_or_path="databricks/dolly-v2-12b")
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "/opt/conda/lib/python3.8/site-packages/haystack/nodes/base.py", line 46, in wrapper_exportable_to_yaml
init_func(self, *args, **kwargs)
File "/opt/conda/lib/python3.8/site-packages/haystack/nodes/prompt/prompt_node.py", line 119, in __init__
self.prompt_model = PromptModel(
File "/opt/conda/lib/python3.8/site-packages/haystack/nodes/base.py", line 46, in wrapper_exportable_to_yaml
init_func(self, *args, **kwargs)
File "/opt/conda/lib/python3.8/site-packages/haystack/nodes/prompt/prompt_model.py", line 71, in __init__
self.model_invocation_layer = self.create_invocation_layer(invocation_layer_class=invocation_layer_class)
File "/opt/conda/lib/python3.8/site-packages/haystack/nodes/prompt/prompt_model.py", line 100, in create_invocation_layer
return invocation_layer(
File "/opt/conda/lib/python3.8/site-packages/haystack/nodes/prompt/invocation_layer/hugging_face.py", line 123, in __init__
self.pipe = pipeline(
File "/opt/conda/lib/python3.8/site-packages/transformers/pipelines/__init__.py", line 674, in pipeline
raise ValueError(
ValueError: Loading this pipeline requires you to execute the code in the pipeline file in that repo on your local machine. Make sure you have read the code there to avoid malicious use, then set the option `trust_remote_code=True` to remove this error.
I'm encoutering the same issue as @recursionbane when attempting to load in Databrick's Dolly 2.0 model. I've tried to add trust_remote_code
in the model_kwargs
dict argument of PromptModel
but this leads to the following error stack:
----> 1 prompt_model, retriever, reader, document_merger, document_store = load_models(document_store)
/tmp/ipykernel_431/1315538438.py in load_models(document_store, search_api_key, llm_model, embedding_model, reader_model)
9 ):
10
---> 11 prompt_model = PromptModel(
12 model_name_or_path=llm_model,
13 use_gpu=True,
~/anaconda3/envs/pytorch_p39/lib/python3.9/site-packages/haystack/nodes/base.py in wrapper_exportable_to_yaml(self, *args, **kwargs)
44
45 # Call the actuall __init__ function with all the arguments
---> 46 init_func(self, *args, **kwargs)
47
48 return wrapper_exportable_to_yaml
~/anaconda3/envs/pytorch_p39/lib/python3.9/site-packages/haystack/nodes/prompt/prompt_model.py in __init__(self, model_name_or_path, max_length, api_key, use_auth_token, use_gpu, devices, invocation_layer_class, model_kwargs)
69
70 self.model_kwargs = model_kwargs if model_kwargs else {}
---> 71 self.model_invocation_layer = self.create_invocation_layer(invocation_layer_class=invocation_layer_class)
72 is_instruction_following: bool = any(m in model_name_or_path for m in instruction_following_models())
73 if not is_instruction_following:
~/anaconda3/envs/pytorch_p39/lib/python3.9/site-packages/haystack/nodes/prompt/prompt_model.py in create_invocation_layer(self, invocation_layer_class)
98 for invocation_layer in PromptModelInvocationLayer.invocation_layer_providers:
99 if invocation_layer.supports(self.model_name_or_path, **all_kwargs):
--> 100 return invocation_layer(
101 model_name_or_path=self.model_name_or_path, max_length=self.max_length, **all_kwargs
102 )
~/anaconda3/envs/pytorch_p39/lib/python3.9/site-packages/haystack/nodes/prompt/invocation_layer/hugging_face.py in __init__(self, model_name_or_path, max_length, use_auth_token, use_gpu, devices, **kwargs)
121 logger.info("Using model input kwargs %s in %s", model_input_kwargs, self.__class__.__name__)
122 self.task_name = get_task(model_name_or_path, use_auth_token=use_auth_token)
--> 123 self.pipe = pipeline(
124 model=model_name_or_path,
125 device=self.devices[0] if "device_map" not in model_input_kwargs else None,
~/anaconda3/envs/pytorch_p39/lib/python3.9/site-packages/transformers/pipelines/__init__.py in pipeline(task, model, config, tokenizer, feature_extractor, framework, revision, use_fast, use_auth_token, device, device_map, torch_dtype, trust_remote_code, model_kwargs, pipeline_class, **kwargs)
643 hub_kwargs["_commit_hash"] = config._commit_hash
644 elif config is None and isinstance(model, str):
--> 645 config = AutoConfig.from_pretrained(model, _from_pipeline=task, **hub_kwargs, **model_kwargs)
646 hub_kwargs["_commit_hash"] = config._commit_hash
647
TypeError: transformers.models.auto.configuration_auto.AutoConfig.from_pretrained() got multiple values for keyword argument 'trust_remote_code'
I think any Instruct-tuned model is going to require trust_remote_code
, like the just-released MPT-7B-Instruct, which is also failing in Haystack for the same reason.
I edited this line in my local transformers
installation to set: trust_remote_code: Optional[bool] = True,
and am able to proceed.
Note that MPT models are not currently supported in Haystack.
@Ok3ks Are you still working on this issue? If not, I would like to work on this. I can open a PR for the same.
Hi @awinml, thanks for your interest! Let's discuss the implementation...
For which node do you want to start implementing this feature?
For the PromptNode
it has been implemented, I think.
(Am I wrong @vblagoje?)
@anakin87 I think we can start with the Summarizer
Node.
I was planning on creating a generation_kwargs
parameter as mentioned in https://github.com/deepset-ai/haystack/issues/3650#issuecomment-1488645119.
To allow the Haystack node to pass kwargs to the
generate
method of HF pipelines kwargs, we can follow the implementation ofgeneration_kwargs
in Transformers ImageToText node.
The Summarizer
node implementation already captures the max_length
and min_length
arguments. We can just unpack those values from the generation_kwargs
without modifying most of the other code.
Nice :star:
@awinml, feel free to work on it and open a small PR!
Is your feature request related to a problem? Please describe. I'd like to add huggingface pipeline arguments to nodes which use HF pipelines. Or at least specifically for the Summarizer node. It currently accepts min_length, max_length (and of course the model_name_or_path). There are a few really impactful hypers in Generation. Ones I like are
no_repeat_ngram_size=2, num_beams=6, num_beam_groups=3, early_stopping=True
and I'm fiddling withdiversity_penalty=1.0, repetition_penalty=1.0
. If I get time I'll dig up the blog posts I found these in.There's also some HF models which require some extra stuff on initialization, eg
trust_remote_code
for LSG models (I like lsg-wcep which gives better "vanilla" summaries than xsum|cnn|etc models, eg CNN summaries sound like news):In that case, it might be easier to allow
model_name_or_path
to take an initialized HF model, not just string.Describe the solution you'd like
Just add optional model-initialization
args
(or whatever you wanna call it) to init and anotherargs
to predict. These will just pass through to pipeline and summarizer.Describe alternatives you've considered
I'll create a custom Summarizer node for now. I might need the extra customization above this request anyway, so it's a healthy approach. Just flagging this need in case others find it useful.
Additional context Add any other context or screenshots about the feature request here.