deepset-ai / haystack

:mag: LLM orchestration framework to build customizable, production-ready LLM applications. Connect components (models, vector DBs, file converters) to pipelines or agents that can interact with your data. With advanced retrieval methods, it's best suited for building RAG, question answering, semantic search or conversational agent chatbots.
https://haystack.deepset.ai
Apache License 2.0
14.66k stars 1.72k forks source link

Add passthrough args for huggingface models (specifically Summarizer) #3650

Closed lefnire closed 4 months ago

lefnire commented 1 year ago

Is your feature request related to a problem? Please describe. I'd like to add huggingface pipeline arguments to nodes which use HF pipelines. Or at least specifically for the Summarizer node. It currently accepts min_length, max_length (and of course the model_name_or_path). There are a few really impactful hypers in Generation. Ones I like are no_repeat_ngram_size=2, num_beams=6, num_beam_groups=3, early_stopping=True and I'm fiddling with diversity_penalty=1.0, repetition_penalty=1.0. If I get time I'll dig up the blog posts I found these in.

There's also some HF models which require some extra stuff on initialization, eg trust_remote_code for LSG models (I like lsg-wcep which gives better "vanilla" summaries than xsum|cnn|etc models, eg CNN summaries sound like news):

tokenizer = AutoTokenizer.from_pretrained(model_name_or_path, trust_remote_code=True)
model = AutoModelForSeq2SeqLM.from_pretrained(model_name_or_path, trust_remote_code=True)

In that case, it might be easier to allow model_name_or_path to take an initialized HF model, not just string.

Describe the solution you'd like

Just add optional model-initialization args (or whatever you wanna call it) to init and another args to predict. These will just pass through to pipeline and summarizer.

Describe alternatives you've considered

I'll create a custom Summarizer node for now. I might need the extra customization above this request anyway, so it's a healthy approach. Just flagging this need in case others find it useful.

Additional context Add any other context or screenshots about the feature request here.

LLukas22 commented 1 year ago

This would also be nice for other nodes, i personally wanted to play around a bit with 8-bit quantization for seq2seq models via bitsandbytes and its kind of frustrating to create a custom node, only to pass a simple boolean flag into the pipelines constructor.

Ok3ks commented 1 year ago

I'll like to work on this, can someone further explain what lefnire means by args to predict

anakin87 commented 1 year ago

To allow the Haystack node to pass kwargs to the generate method of HF pipelines kwargs, we can follow the implementation of generation_kwargs in Transformers ImageToText node.

LLukas22 commented 1 year ago

Had a look at the ImageToText node and it seams to handle the generation_kwargs exactly as i would want it in other nodes.

I also played around a bit with the HFLocalInvocationLayer of the PromptNode and noticed that it filters the kwargs before passing them into the pipeline.

It's kind of confusing that different text generation nodes behave differently. Is there a possibility to enable passing a GenerationConfig instance into each generation-call that supports it? This could maybe streamline the behavior of the generation nodes. 🤔

vblagoje commented 1 year ago

Yeah, I experimented with generate_kwargs now that I am trying to find HF model that can run an Agent. Still no luck, but no reason not to add support 'generate_kwargs' to model parameters. We filter because we want to have a great user experience (easily create any PromptNode) and run various models underneath PromptNode. I see no other way to enable all models to pass all their parameters via PromptNode and make all of them work seamlessly. Any other approach but to filter in PromptNodeInvocationLayer?

LLukas22 commented 1 year ago

@vblagoje This actually makes a lot of sense, and i get why its important to provide a good user experience. But i guess the parameter filtering is already handled by the transformers library when using a GenerationConfig instead of passing the generation parameters manually.

At least as of transformers version 4.26 every generate method should support the GenerationConfig. And as stated in the documentation every generation model should support it ("A generate call supports the following generation methods for text-decoder, text-to-text, speech-to-text, and vision-to-text models").

And another of topic question: Is there a possibility to expand the PromptNodeInvocationLayer to also support PEFT finetuned models? Many of the instruction finetuned models only provide LoRA weights which need to be loaded with an additional base model.

recursionbane commented 1 year ago

I'm running into this when trying to use databricks/dolly-v2-12b as part of a PromptNode. This model, by default, requires trust_remote_code=True. What is the recommended approach to use such a model as part of PromptNode?

>>> from haystack.nodes import PromptNode
>>> prompt_node = PromptNode(model_name_or_path="databricks/dolly-v2-12b")
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/opt/conda/lib/python3.8/site-packages/haystack/nodes/base.py", line 46, in wrapper_exportable_to_yaml
    init_func(self, *args, **kwargs)
  File "/opt/conda/lib/python3.8/site-packages/haystack/nodes/prompt/prompt_node.py", line 119, in __init__
    self.prompt_model = PromptModel(
  File "/opt/conda/lib/python3.8/site-packages/haystack/nodes/base.py", line 46, in wrapper_exportable_to_yaml
    init_func(self, *args, **kwargs)
  File "/opt/conda/lib/python3.8/site-packages/haystack/nodes/prompt/prompt_model.py", line 71, in __init__
    self.model_invocation_layer = self.create_invocation_layer(invocation_layer_class=invocation_layer_class)
  File "/opt/conda/lib/python3.8/site-packages/haystack/nodes/prompt/prompt_model.py", line 100, in create_invocation_layer
    return invocation_layer(
  File "/opt/conda/lib/python3.8/site-packages/haystack/nodes/prompt/invocation_layer/hugging_face.py", line 123, in __init__
    self.pipe = pipeline(
  File "/opt/conda/lib/python3.8/site-packages/transformers/pipelines/__init__.py", line 674, in pipeline
    raise ValueError(
ValueError: Loading this pipeline requires you to execute the code in the pipeline file in that repo on your local machine. Make sure you have read the code there to avoid malicious use, then set the option `trust_remote_code=True` to remove this error.
Blee1077 commented 1 year ago

I'm encoutering the same issue as @recursionbane when attempting to load in Databrick's Dolly 2.0 model. I've tried to add trust_remote_code in the model_kwargsdict argument of PromptModel but this leads to the following error stack:

----> 1 prompt_model, retriever, reader, document_merger, document_store = load_models(document_store)

/tmp/ipykernel_431/1315538438.py in load_models(document_store, search_api_key, llm_model, embedding_model, reader_model)
      9 ):
     10 
---> 11     prompt_model = PromptModel(
     12         model_name_or_path=llm_model,
     13         use_gpu=True,

~/anaconda3/envs/pytorch_p39/lib/python3.9/site-packages/haystack/nodes/base.py in wrapper_exportable_to_yaml(self, *args, **kwargs)
     44 
     45         # Call the actuall __init__ function with all the arguments
---> 46         init_func(self, *args, **kwargs)
     47 
     48     return wrapper_exportable_to_yaml

~/anaconda3/envs/pytorch_p39/lib/python3.9/site-packages/haystack/nodes/prompt/prompt_model.py in __init__(self, model_name_or_path, max_length, api_key, use_auth_token, use_gpu, devices, invocation_layer_class, model_kwargs)
     69 
     70         self.model_kwargs = model_kwargs if model_kwargs else {}
---> 71         self.model_invocation_layer = self.create_invocation_layer(invocation_layer_class=invocation_layer_class)
     72         is_instruction_following: bool = any(m in model_name_or_path for m in instruction_following_models())
     73         if not is_instruction_following:

~/anaconda3/envs/pytorch_p39/lib/python3.9/site-packages/haystack/nodes/prompt/prompt_model.py in create_invocation_layer(self, invocation_layer_class)
     98         for invocation_layer in PromptModelInvocationLayer.invocation_layer_providers:
     99             if invocation_layer.supports(self.model_name_or_path, **all_kwargs):
--> 100                 return invocation_layer(
    101                     model_name_or_path=self.model_name_or_path, max_length=self.max_length, **all_kwargs
    102                 )

~/anaconda3/envs/pytorch_p39/lib/python3.9/site-packages/haystack/nodes/prompt/invocation_layer/hugging_face.py in __init__(self, model_name_or_path, max_length, use_auth_token, use_gpu, devices, **kwargs)
    121             logger.info("Using model input kwargs %s in %s", model_input_kwargs, self.__class__.__name__)
    122         self.task_name = get_task(model_name_or_path, use_auth_token=use_auth_token)
--> 123         self.pipe = pipeline(
    124             model=model_name_or_path,
    125             device=self.devices[0] if "device_map" not in model_input_kwargs else None,

~/anaconda3/envs/pytorch_p39/lib/python3.9/site-packages/transformers/pipelines/__init__.py in pipeline(task, model, config, tokenizer, feature_extractor, framework, revision, use_fast, use_auth_token, device, device_map, torch_dtype, trust_remote_code, model_kwargs, pipeline_class, **kwargs)
    643         hub_kwargs["_commit_hash"] = config._commit_hash
    644     elif config is None and isinstance(model, str):
--> 645         config = AutoConfig.from_pretrained(model, _from_pipeline=task, **hub_kwargs, **model_kwargs)
    646         hub_kwargs["_commit_hash"] = config._commit_hash
    647 

TypeError: transformers.models.auto.configuration_auto.AutoConfig.from_pretrained() got multiple values for keyword argument 'trust_remote_code'
recursionbane commented 1 year ago

I think any Instruct-tuned model is going to require trust_remote_code, like the just-released MPT-7B-Instruct, which is also failing in Haystack for the same reason.

recursionbane commented 1 year ago

I edited this line in my local transformers installation to set: trust_remote_code: Optional[bool] = True, and am able to proceed.

Note that MPT models are not currently supported in Haystack.

awinml commented 12 months ago

@Ok3ks Are you still working on this issue? If not, I would like to work on this. I can open a PR for the same.

anakin87 commented 12 months ago

Hi @awinml, thanks for your interest! Let's discuss the implementation...

For which node do you want to start implementing this feature?

For the PromptNode it has been implemented, I think. (Am I wrong @vblagoje?)

awinml commented 12 months ago

@anakin87 I think we can start with the Summarizer Node.

I was planning on creating a generation_kwargs parameter as mentioned in https://github.com/deepset-ai/haystack/issues/3650#issuecomment-1488645119.

To allow the Haystack node to pass kwargs to the generate method of HF pipelines kwargs, we can follow the implementation of generation_kwargs in Transformers ImageToText node.

The Summarizer node implementation already captures the max_length and min_length arguments. We can just unpack those values from the generation_kwargs without modifying most of the other code.

anakin87 commented 12 months ago

Nice :star:

@awinml, feel free to work on it and open a small PR!