Closed Asma-droid closed 6 months ago
Hey @Asma-droid...
Try with "torch.bfloat16" (quotes included).
hello @anakin87 . I still have the same problem
Please provide more information. Which specific beta version are you using? How did you generate such serialized pipeline? How are you deserializing that pipeline?
With this information, tomorrow I will have a better look...
@anakin87 Thanks a lot for your reactivity!
I'am using haystack-ai 2.0.0b8.
my pipeline is:
query_pipeline = Pipeline()
query_pipeline.add_component("text_embedder", SentenceTransformersTextEmbedder(model="sentence-transformers/all-MiniLM-L6-v2"))
query_pipeline.add_component("retriever", ElasticsearchEmbeddingRetriever(document_store=document_store))
query_pipeline.add_component("prompt_builder", prompt_builder)
query_pipeline.add_component("llm", llm)
query_pipeline.add_component("AnswerBuilder", AnswerBuilder())
query_pipeline.connect("text_embedder.embedding", "retriever.query_embedding")
query_pipeline.connect("retriever", "prompt_builder.documents")
query_pipeline.connect("prompt_builder", "llm")
query_pipeline.connect("llm.replies", "AnswerBuilder")
#query_pipeline.connect("llm.meta", "AnswerBuilder")
query_pipeline.connect("retriever", "AnswerBuilder.documents")
i generated automatically the serialized pipeline using query_pipeline.dumps(). i have got
components:
AnswerBuilder:
init_parameters:
pattern: null
reference_pattern: null
type: haystack.components.builders.answer_builder.AnswerBuilder
llm:
init_parameters:
generation_kwargs:
max_new_tokens: 350
return_full_text: false
huggingface_pipeline_kwargs:
device_map: auto
model: mistralai/Mixtral-8x7B-Instruct-v0.1
model_kwargs:
bnb_4bit_compute_dtype: torch.bfloat16
bnb_4bit_quant_type: nf4
bnb_4bit_use_double_quant: true
device_map: auto
load_in_4bit: true
task: text-generation
stop_words: null
token:
env_vars:
- HF_API_TOKEN
strict: false
type: env_var
type: haystack.components.generators.hugging_face_local.HuggingFaceLocalGenerator
prompt_builder:
init_parameters:
template: "\nAnswer the query based on the provided context. Please if the query\
\ is in french, provide the answer in french also \nIf the context does not\
\ contain the answer, say 'Answer not found'.\nContext:\n{% for doc in documents\
\ %}\n {{ doc.content }}\n{% endfor %}\nquery: {{query}}\nAnswer:\n"
type: haystack.components.builders.prompt_builder.PromptBuilder
retriever:
init_parameters:
document_store:
init_parameters:
embedding_similarity_function: cosine
hosts: http://127.0.0.1:9200
index: document-cfdt-new-asma
type: haystack_integrations.document_stores.elasticsearch.document_store.ElasticsearchDocumentStore
filters: {}
num_candidates: null
top_k: 10
type: haystack_integrations.components.retrievers.elasticsearch.embedding_retriever.ElasticsearchEmbeddingRetriever
text_embedder:
init_parameters:
batch_size: 32
device:
device: cuda:0
type: single
model: sentence-transformers/all-MiniLM-L6-v2
normalize_embeddings: false
prefix: ''
progress_bar: true
suffix: ''
token:
env_vars:
- HF_API_TOKEN
strict: false
type: env_var
type: haystack.components.embedders.sentence_transformers_text_embedder.SentenceTransformersTextEmbedder
connections:
- receiver: retriever.query_embedding
sender: text_embedder.embedding
- receiver: prompt_builder.documents
sender: retriever.documents
- receiver: AnswerBuilder.documents
sender: retriever.documents
- receiver: llm.prompt
sender: prompt_builder.prompt
- receiver: AnswerBuilder.replies
sender: llm.replies
max_loops_allowed: 100
metadata: {}
I have saved my pipeline in a file:
and i have executed this code
from haystack import Pipeline
with open("./src/pipelines/rag_pipeline.yaml", "rb") as f:
rag_piepeline = Pipeline.loads(f)
i have also traied Pipeline.loads(f)
i got the problem when i launch
query="What is Haystack?"
result = rag_piepeline.run(
{"text_embedder": {"text": query},
"retriever": {"top_k": 10},
"prompt_builder": {"query": query},
"llm":{"generation_kwargs": {"max_new_tokens": 1000}},
"AnswerBuilder": {"query": query}})
Please also provide the code to initialize the llm.
@anakin87
llm:
init_parameters:
huggingface_pipeline_kwargs:
model: mistralai/Mixtral-8x7B-Instruct-v0.1
device_map: auto
model_kwargs:
bnb_4bit_compute_dtype: torch.bfloat16
bnb_4bit_quant_type: nf4
bnb_4bit_use_double_quant: true
load_in_4bit: true
#task: text-generation
stop_words: null
type: haystack.components.generators.hugging_face_local.HuggingFaceLocalGenerator
and
llm = HuggingFaceLocalGenerator(model="mistralai/Mixtral-8x7B-Instruct-v0.1",
task="text-generation",
huggingface_pipeline_kwargs={"device_map":"auto",
"model_kwargs":{"load_in_4bit":True,
"bnb_4bit_use_double_quant":True,
"bnb_4bit_quant_type":"nf4",
"bnb_4bit_compute_dtype":torch.bfloat16}})
I have made some experiments and it seems to be a bug. I will add some information.
Using the generator on its own works well
from haystack.components.generators import HuggingFaceLocalGenerator
from haystack import Pipeline
import torch
pipe = Pipeline()
prompt_template = "Answer the following question: {{question}}"
prompt_builder = PromptBuilder(prompt_template)
llm = HuggingFaceLocalGenerator(model="microsoft/phi-2",
huggingface_pipeline_kwargs={"device_map":"auto",
"model_kwargs":{"load_in_4bit":True,
"bnb_4bit_use_double_quant":True,
"bnb_4bit_quant_type":"nf4",
"bnb_4bit_compute_dtype":torch.bfloat16})
llm.warm_up()
llm.run("What is the capital of Germany? Answer:")
Using the generator in a Pipeline causes errors (probably related to serialization)
from haystack.components.generators import HuggingFaceLocalGenerator
from haystack import Pipeline
from haystack.components.builders import PromptBuilder
import torch
pipe = Pipeline()
prompt_template = "Answer the following question: {{question}}"
prompt_builder = PromptBuilder(prompt_template)
llm = HuggingFaceLocalGenerator(
model="microsoft/phi-2",
huggingface_pipeline_kwargs={
"device_map": "auto",
"model_kwargs": {
"load_in_4bit": True,
"bnb_4bit_use_double_quant": True,
"bnb_4bit_quant_type": "nf4",
"bnb_4bit_compute_dtype": torch.bfloat16,
},
},
)
# llm.warm_up()
# llm.run("What is the capital of Germany? Answer:")
pipe.add_component("prompt_builder", prompt_builder)
pipe.add_component("llm", llm)
pipe.connect("prompt_builder", "llm")
print(pipe.run({"question": "What is the capital of Germany?"}))
AttributeError: module 'torch' has no attribute 'torch.bfloat16'
Exactely, i have face the same errors. As i'am using serialized pipeline, i dont know how to solve the issue
I have verified that this is the same problem expressed in #7255 and it is related to incorrect serialization in telemetry. We will solve this soon.
In the meantime, you can disable telemetry:
export HAYSTACK_TELEMETRY_ENABLED="False"
Hi @Asma-droid, did the fix help solve your problem? I would love to hear more about how you're using Haystack.
Yes this fix solve my problem! Thanks a lot for your reactivity!
Great news! Would you mind spending some time giving me more context about your use case? My email is maria.mestre@dataqa.ai.
Hello,
I woul like to use pipeline serialization. Below is my code for the llm
this code will generate the below error:
~/.local/lib/python3.9/site-packages/bitsandbytes/nn/modules.py in forward(self, x) 251 inp_dtype = x.dtype 252 if self.compute_dtype is not None: --> 253 x = x.to(self.compute_dtype) 254 255 bias = None if self.bias is None else self.bias.to(self.compute_dtype)
RuntimeError: Invalid device string: 'torch.bfloat16'
I dont know if there is another way to pass the compute_type
Best regards