deepset-ai / haystack

:mag: AI orchestration framework to build customizable, production-ready LLM applications. Connect components (models, vector DBs, file converters) to pipelines or agents that can interact with your data. With advanced retrieval methods, it's best suited for building RAG, question answering, semantic search or conversational agent chatbots.
https://haystack.deepset.ai
Apache License 2.0
16.96k stars 1.85k forks source link

Problem when using pipeline serialization #7277

Closed Asma-droid closed 6 months ago

Asma-droid commented 7 months ago

Hello,

I woul like to use pipeline serialization. Below is my code for the llm

llm:
    init_parameters:
      huggingface_pipeline_kwargs:
        model: mistralai/Mixtral-8x7B-Instruct-v0.1
        device_map: auto
        model_kwargs:
          bnb_4bit_compute_dtype: torch.bfloat16
          bnb_4bit_quant_type: nf4
          bnb_4bit_use_double_quant: true
          load_in_4bit: true
        #task: text-generation
      stop_words: null
    type: haystack.components.generators.hugging_face_local.HuggingFaceLocalGenerator

this code will generate the below error:

~/.local/lib/python3.9/site-packages/bitsandbytes/nn/modules.py in forward(self, x) 251 inp_dtype = x.dtype 252 if self.compute_dtype is not None: --> 253 x = x.to(self.compute_dtype) 254 255 bias = None if self.bias is None else self.bias.to(self.compute_dtype)

RuntimeError: Invalid device string: 'torch.bfloat16'

I dont know if there is another way to pass the compute_type

Best regards

anakin87 commented 7 months ago

Hey @Asma-droid...

Try with "torch.bfloat16" (quotes included).

Asma-droid commented 7 months ago

hello @anakin87 . I still have the same problem

anakin87 commented 7 months ago

Please provide more information. Which specific beta version are you using? How did you generate such serialized pipeline? How are you deserializing that pipeline?

With this information, tomorrow I will have a better look...

Asma-droid commented 7 months ago

@anakin87 Thanks a lot for your reactivity!

I'am using haystack-ai 2.0.0b8.

my pipeline is:
query_pipeline = Pipeline()
query_pipeline.add_component("text_embedder", SentenceTransformersTextEmbedder(model="sentence-transformers/all-MiniLM-L6-v2"))
query_pipeline.add_component("retriever", ElasticsearchEmbeddingRetriever(document_store=document_store))
query_pipeline.add_component("prompt_builder", prompt_builder)
query_pipeline.add_component("llm", llm)
query_pipeline.add_component("AnswerBuilder", AnswerBuilder())

query_pipeline.connect("text_embedder.embedding", "retriever.query_embedding")
query_pipeline.connect("retriever", "prompt_builder.documents")
query_pipeline.connect("prompt_builder", "llm")
query_pipeline.connect("llm.replies", "AnswerBuilder")
#query_pipeline.connect("llm.meta", "AnswerBuilder")
query_pipeline.connect("retriever", "AnswerBuilder.documents")

i generated automatically the serialized pipeline using query_pipeline.dumps(). i have got

components:
  AnswerBuilder:
    init_parameters:
      pattern: null
      reference_pattern: null
    type: haystack.components.builders.answer_builder.AnswerBuilder
  llm:
    init_parameters:
      generation_kwargs:
        max_new_tokens: 350
        return_full_text: false
      huggingface_pipeline_kwargs:
        device_map: auto
        model: mistralai/Mixtral-8x7B-Instruct-v0.1
        model_kwargs:
          bnb_4bit_compute_dtype: torch.bfloat16
          bnb_4bit_quant_type: nf4
          bnb_4bit_use_double_quant: true
          device_map: auto
          load_in_4bit: true
        task: text-generation
      stop_words: null
      token:
        env_vars:
        - HF_API_TOKEN
        strict: false
        type: env_var
    type: haystack.components.generators.hugging_face_local.HuggingFaceLocalGenerator
  prompt_builder:
    init_parameters:
      template: "\nAnswer the query based on the provided context. Please if the query\
        \ is in french, provide the answer in french also \nIf the context does not\
        \ contain the answer, say 'Answer not found'.\nContext:\n{% for doc in documents\
        \ %}\n  {{ doc.content }}\n{% endfor %}\nquery: {{query}}\nAnswer:\n"
    type: haystack.components.builders.prompt_builder.PromptBuilder
  retriever:
    init_parameters:
      document_store:
        init_parameters:
          embedding_similarity_function: cosine
          hosts: http://127.0.0.1:9200
          index: document-cfdt-new-asma
        type: haystack_integrations.document_stores.elasticsearch.document_store.ElasticsearchDocumentStore
      filters: {}
      num_candidates: null
      top_k: 10
    type: haystack_integrations.components.retrievers.elasticsearch.embedding_retriever.ElasticsearchEmbeddingRetriever
  text_embedder:
    init_parameters:
      batch_size: 32
      device:
        device: cuda:0
        type: single
      model: sentence-transformers/all-MiniLM-L6-v2
      normalize_embeddings: false
      prefix: ''
      progress_bar: true
      suffix: ''
      token:
        env_vars:
        - HF_API_TOKEN
        strict: false
        type: env_var
    type: haystack.components.embedders.sentence_transformers_text_embedder.SentenceTransformersTextEmbedder
connections:
- receiver: retriever.query_embedding
  sender: text_embedder.embedding
- receiver: prompt_builder.documents
  sender: retriever.documents
- receiver: AnswerBuilder.documents
  sender: retriever.documents
- receiver: llm.prompt
  sender: prompt_builder.prompt
- receiver: AnswerBuilder.replies
  sender: llm.replies
max_loops_allowed: 100
metadata: {}

I have saved my pipeline in a file:

and i have executed this code

from haystack import Pipeline
with open("./src/pipelines/rag_pipeline.yaml", "rb") as f:
    rag_piepeline = Pipeline.loads(f)

i have also traied Pipeline.loads(f)

i got the problem when i launch

query="What is Haystack?"
result = rag_piepeline.run(
     {"text_embedder": {"text": query},
                     "retriever": {"top_k": 10},
                     "prompt_builder": {"query": query},
                     "llm":{"generation_kwargs": {"max_new_tokens": 1000}},
                     "AnswerBuilder": {"query": query}})
anakin87 commented 7 months ago

Please also provide the code to initialize the llm.

Asma-droid commented 7 months ago

@anakin87

llm:
  init_parameters:
    huggingface_pipeline_kwargs:
      model: mistralai/Mixtral-8x7B-Instruct-v0.1
      device_map: auto
      model_kwargs:
        bnb_4bit_compute_dtype: torch.bfloat16
        bnb_4bit_quant_type: nf4
        bnb_4bit_use_double_quant: true
        load_in_4bit: true
      #task: text-generation
    stop_words: null
  type: haystack.components.generators.hugging_face_local.HuggingFaceLocalGenerator

and


llm = HuggingFaceLocalGenerator(model="mistralai/Mixtral-8x7B-Instruct-v0.1",
                                          task="text-generation",
                                          huggingface_pipeline_kwargs={"device_map":"auto",
                                                   "model_kwargs":{"load_in_4bit":True,
                                                                   "bnb_4bit_use_double_quant":True,
                                                                   "bnb_4bit_quant_type":"nf4",
                                                                   "bnb_4bit_compute_dtype":torch.bfloat16}})
anakin87 commented 7 months ago

I have made some experiments and it seems to be a bug. I will add some information.

Using the generator on its own works well

from haystack.components.generators import HuggingFaceLocalGenerator
from haystack import Pipeline
import torch

pipe = Pipeline()

prompt_template = "Answer the following question: {{question}}"
prompt_builder = PromptBuilder(prompt_template)

llm = HuggingFaceLocalGenerator(model="microsoft/phi-2",
                                           huggingface_pipeline_kwargs={"device_map":"auto",
                                                   "model_kwargs":{"load_in_4bit":True,
                                                                   "bnb_4bit_use_double_quant":True,
                                                                   "bnb_4bit_quant_type":"nf4",
                                                                   "bnb_4bit_compute_dtype":torch.bfloat16})

llm.warm_up()

llm.run("What is the capital of Germany? Answer:")

Using the generator in a Pipeline causes errors (probably related to serialization)

from haystack.components.generators import HuggingFaceLocalGenerator
from haystack import Pipeline
from haystack.components.builders import PromptBuilder
import torch

pipe = Pipeline()

prompt_template = "Answer the following question: {{question}}"
prompt_builder = PromptBuilder(prompt_template)

llm = HuggingFaceLocalGenerator(
    model="microsoft/phi-2",
    huggingface_pipeline_kwargs={
        "device_map": "auto",
        "model_kwargs": {
            "load_in_4bit": True,
            "bnb_4bit_use_double_quant": True,
            "bnb_4bit_quant_type": "nf4",
            "bnb_4bit_compute_dtype": torch.bfloat16,
        },
    },
)

# llm.warm_up()
# llm.run("What is the capital of Germany? Answer:")

pipe.add_component("prompt_builder", prompt_builder)
pipe.add_component("llm", llm)
pipe.connect("prompt_builder", "llm")

print(pipe.run({"question": "What is the capital of Germany?"}))

AttributeError: module 'torch' has no attribute 'torch.bfloat16'

Asma-droid commented 7 months ago

Exactely, i have face the same errors. As i'am using serialized pipeline, i dont know how to solve the issue

anakin87 commented 7 months ago

I have verified that this is the same problem expressed in #7255 and it is related to incorrect serialization in telemetry. We will solve this soon.

In the meantime, you can disable telemetry: export HAYSTACK_TELEMETRY_ENABLED="False"

mrm1001 commented 6 months ago

Hi @Asma-droid, did the fix help solve your problem? I would love to hear more about how you're using Haystack.

Asma-droid commented 6 months ago

Yes this fix solve my problem! Thanks a lot for your reactivity!

mrm1001 commented 6 months ago

Great news! Would you mind spending some time giving me more context about your use case? My email is maria.mestre@dataqa.ai.