Problem when using pipeline serialization

Asma-droid commented 7 months ago

Hello,

I woul like to use pipeline serialization. Below is my code for the llm

llm:
    init_parameters:
      huggingface_pipeline_kwargs:
        model: mistralai/Mixtral-8x7B-Instruct-v0.1
        device_map: auto
        model_kwargs:
          bnb_4bit_compute_dtype: torch.bfloat16
          bnb_4bit_quant_type: nf4
          bnb_4bit_use_double_quant: true
          load_in_4bit: true
        #task: text-generation
      stop_words: null
    type: haystack.components.generators.hugging_face_local.HuggingFaceLocalGenerator

this code will generate the below error:

~/.local/lib/python3.9/site-packages/bitsandbytes/nn/modules.py in forward(self, x) 251 inp_dtype = x.dtype 252 if self.compute_dtype is not None: --> 253 x = x.to(self.compute_dtype) 254 255 bias = None if self.bias is None else self.bias.to(self.compute_dtype)

RuntimeError: Invalid device string: 'torch.bfloat16'

I dont know if there is another way to pass the compute_type

Best regards

anakin87 commented 7 months ago

Hey @Asma-droid...

Try with "torch.bfloat16" (quotes included).

Asma-droid commented 7 months ago

hello @anakin87 . I still have the same problem

anakin87 commented 7 months ago

Please provide more information. Which specific beta version are you using? How did you generate such serialized pipeline? How are you deserializing that pipeline?

With this information, tomorrow I will have a better look...

Asma-droid commented 7 months ago

@anakin87 Thanks a lot for your reactivity!

I'am using haystack-ai 2.0.0b8.

my pipeline is:
query_pipeline = Pipeline()
query_pipeline.add_component("text_embedder", SentenceTransformersTextEmbedder(model="sentence-transformers/all-MiniLM-L6-v2"))
query_pipeline.add_component("retriever", ElasticsearchEmbeddingRetriever(document_store=document_store))
query_pipeline.add_component("prompt_builder", prompt_builder)
query_pipeline.add_component("llm", llm)
query_pipeline.add_component("AnswerBuilder", AnswerBuilder())

query_pipeline.connect("text_embedder.embedding", "retriever.query_embedding")
query_pipeline.connect("retriever", "prompt_builder.documents")
query_pipeline.connect("prompt_builder", "llm")
query_pipeline.connect("llm.replies", "AnswerBuilder")
#query_pipeline.connect("llm.meta", "AnswerBuilder")
query_pipeline.connect("retriever", "AnswerBuilder.documents")

i generated automatically the serialized pipeline using query_pipeline.dumps(). i have got

components:
  AnswerBuilder:
    init_parameters:
      pattern: null
      reference_pattern: null
    type: haystack.components.builders.answer_builder.AnswerBuilder
  llm:
    init_parameters:
      generation_kwargs:
        max_new_tokens: 350
        return_full_text: false
      huggingface_pipeline_kwargs:
        device_map: auto
        model: mistralai/Mixtral-8x7B-Instruct-v0.1
        model_kwargs:
          bnb_4bit_compute_dtype: torch.bfloat16
          bnb_4bit_quant_type: nf4
          bnb_4bit_use_double_quant: true
          device_map: auto
          load_in_4bit: true
        task: text-generation
      stop_words: null
      token:
        env_vars:
        - HF_API_TOKEN
        strict: false
        type: env_var
    type: haystack.components.generators.hugging_face_local.HuggingFaceLocalGenerator
  prompt_builder:
    init_parameters:
      template: "\nAnswer the query based on the provided context. Please if the query\
        \ is in french, provide the answer in french also \nIf the context does not\
        \ contain the answer, say 'Answer not found'.\nContext:\n{% for doc in documents\
        \ %}\n  {{ doc.content }}\n{% endfor %}\nquery: {{query}}\nAnswer:\n"
    type: haystack.components.builders.prompt_builder.PromptBuilder
  retriever:
    init_parameters:
      document_store:
        init_parameters:
          embedding_similarity_function: cosine
          hosts: http://127.0.0.1:9200
          index: document-cfdt-new-asma
        type: haystack_integrations.document_stores.elasticsearch.document_store.ElasticsearchDocumentStore
      filters: {}
      num_candidates: null
      top_k: 10
    type: haystack_integrations.components.retrievers.elasticsearch.embedding_retriever.ElasticsearchEmbeddingRetriever
  text_embedder:
    init_parameters:
      batch_size: 32
      device:
        device: cuda:0
        type: single
      model: sentence-transformers/all-MiniLM-L6-v2
      normalize_embeddings: false
      prefix: ''
      progress_bar: true
      suffix: ''
      token:
        env_vars:
        - HF_API_TOKEN
        strict: false
        type: env_var
    type: haystack.components.embedders.sentence_transformers_text_embedder.SentenceTransformersTextEmbedder
connections:
- receiver: retriever.query_embedding
  sender: text_embedder.embedding
- receiver: prompt_builder.documents
  sender: retriever.documents
- receiver: AnswerBuilder.documents
  sender: retriever.documents
- receiver: llm.prompt
  sender: prompt_builder.prompt
- receiver: AnswerBuilder.replies
  sender: llm.replies
max_loops_allowed: 100
metadata: {}

I have saved my pipeline in a file:

and i have executed this code

from haystack import Pipeline
with open("./src/pipelines/rag_pipeline.yaml", "rb") as f:
    rag_piepeline = Pipeline.loads(f)

i have also traied Pipeline.loads(f)

i got the problem when i launch

query="What is Haystack?"
result = rag_piepeline.run(
     {"text_embedder": {"text": query},
                     "retriever": {"top_k": 10},
                     "prompt_builder": {"query": query},
                     "llm":{"generation_kwargs": {"max_new_tokens": 1000}},
                     "AnswerBuilder": {"query": query}})

anakin87 commented 7 months ago

Please also provide the code to initialize the llm.

Asma-droid commented 7 months ago

@anakin87

llm:
  init_parameters:
    huggingface_pipeline_kwargs:
      model: mistralai/Mixtral-8x7B-Instruct-v0.1
      device_map: auto
      model_kwargs:
        bnb_4bit_compute_dtype: torch.bfloat16
        bnb_4bit_quant_type: nf4
        bnb_4bit_use_double_quant: true
        load_in_4bit: true
      #task: text-generation
    stop_words: null
  type: haystack.components.generators.hugging_face_local.HuggingFaceLocalGenerator

and


llm = HuggingFaceLocalGenerator(model="mistralai/Mixtral-8x7B-Instruct-v0.1",
                                          task="text-generation",
                                          huggingface_pipeline_kwargs={"device_map":"auto",
                                                   "model_kwargs":{"load_in_4bit":True,
                                                                   "bnb_4bit_use_double_quant":True,
                                                                   "bnb_4bit_quant_type":"nf4",
                                                                   "bnb_4bit_compute_dtype":torch.bfloat16}})

anakin87 commented 7 months ago

I have made some experiments and it seems to be a bug. I will add some information.

Using the generator on its own works well

from haystack.components.generators import HuggingFaceLocalGenerator
from haystack import Pipeline
import torch

pipe = Pipeline()

prompt_template = "Answer the following question: {{question}}"
prompt_builder = PromptBuilder(prompt_template)

llm = HuggingFaceLocalGenerator(model="microsoft/phi-2",
                                           huggingface_pipeline_kwargs={"device_map":"auto",
                                                   "model_kwargs":{"load_in_4bit":True,
                                                                   "bnb_4bit_use_double_quant":True,
                                                                   "bnb_4bit_quant_type":"nf4",
                                                                   "bnb_4bit_compute_dtype":torch.bfloat16})

llm.warm_up()

llm.run("What is the capital of Germany? Answer:")

Using the generator in a Pipeline causes errors (probably related to serialization)

from haystack.components.generators import HuggingFaceLocalGenerator
from haystack import Pipeline
from haystack.components.builders import PromptBuilder
import torch

pipe = Pipeline()

prompt_template = "Answer the following question: {{question}}"
prompt_builder = PromptBuilder(prompt_template)

llm = HuggingFaceLocalGenerator(
    model="microsoft/phi-2",
    huggingface_pipeline_kwargs={
        "device_map": "auto",
        "model_kwargs": {
            "load_in_4bit": True,
            "bnb_4bit_use_double_quant": True,
            "bnb_4bit_quant_type": "nf4",
            "bnb_4bit_compute_dtype": torch.bfloat16,
        },
    },
)

# llm.warm_up()
# llm.run("What is the capital of Germany? Answer:")

pipe.add_component("prompt_builder", prompt_builder)
pipe.add_component("llm", llm)
pipe.connect("prompt_builder", "llm")

print(pipe.run({"question": "What is the capital of Germany?"}))

AttributeError: module 'torch' has no attribute 'torch.bfloat16'

Asma-droid commented 7 months ago

Exactely, i have face the same errors. As i'am using serialized pipeline, i dont know how to solve the issue

anakin87 commented 7 months ago

I have verified that this is the same problem expressed in #7255 and it is related to incorrect serialization in telemetry. We will solve this soon.

In the meantime, you can disable telemetry: export HAYSTACK_TELEMETRY_ENABLED="False"

mrm1001 commented 6 months ago

Hi @Asma-droid, did the fix help solve your problem? I would love to hear more about how you're using Haystack.

Asma-droid commented 6 months ago

Yes this fix solve my problem! Thanks a lot for your reactivity!

mrm1001 commented 6 months ago

Great news! Would you mind spending some time giving me more context about your use case? My email is maria.mestre@dataqa.ai.

deepset-ai / haystack

Problem when using pipeline serialization #7277