Open amin-kh96 opened 1 month ago
To resolve the OpenAIError
related to the API key when using a custom subclass of BaseRagasEmbeddings
with the 'distilbert-base-uncased' model from Hugging Face, you should ensure that your custom embeddings class does not inadvertently require an OpenAI API key. It seems like you are already using a Hugging Face model, which should not require an OpenAI API key. However, the error suggests that somewhere in your code, there might be a call or configuration that still expects an OpenAI API key.
Here are a few steps to troubleshoot and resolve the issue:
Check the evaluate
Function: Ensure that the evaluate
function you are using does not internally require an OpenAI API key. If it does, you might need to modify it or use a different evaluation approach that does not depend on OpenAI services.
Custom Embeddings Initialization: When you initialize your CustomHuggingFaceRagasEmbeddings
, make sure you are not passing any parameters or configurations that might trigger the use of OpenAI services. Your current implementation seems correct in this regard, as it uses Hugging Face's AutoModel
and AutoTokenizer
.
Remove OpenAI Dependencies: Double-check your imports and any other parts of your code to ensure there are no lingering dependencies or calls to OpenAI's API. This includes checking any utility functions or libraries that might be used within the evaluate
function.
Use HuggingfaceEmbeddings
Directly: If the issue persists, consider using the HuggingfaceEmbeddings
class provided by the Ragas framework, which is specifically designed to work with Hugging Face models without requiring an OpenAI API key. Here's a quick example of how you can use it:
from ragas.embeddings.base import HuggingfaceEmbeddings
# Create an instance of HuggingfaceEmbeddings with the desired model
embeddings = HuggingfaceEmbeddings(model_name='distilbert-base-uncased')
# Use the embed_query or embed_documents methods to get embeddings
text_embedding = embeddings.embed_query("Your text here")
document_embeddings = embeddings.embed_documents(["Document 1", "Document 2"])
By following these steps, you should be able to avoid the need for an OpenAI API key and resolve the error [1].
To continue talking to Dosu, mention @dosu.
I checked the evaluate function and deep inside there is a function called 'embedding_factory'. by default it use openai I put the whole function here.
def embedding_factory( model: str = "text-embedding-ada-002", run_config: t.Optional[RunConfig] = None ) -> BaseRagasEmbeddings: openai_embeddings = OpenAIEmbeddings(model=model) if run_config is not None: openai_embeddings.request_timeout = run_config.timeout else: run_config = RunConfig() return LangchainEmbeddingsWrapper(openai_embeddings, run_config=run_config)
is it possible to modify the embedding_factory default openai embedding while it is used in the 'evaluate' function?
Yes, you can modify the behavior of the evaluate
function to use your custom embeddings instead of the default OpenAI embeddings provided by the embedding_factory
. You can do this by passing your custom embeddings directly to the evaluate
function. This will bypass the use of the embedding_factory
function, as the evaluate
function checks if embeddings are provided and uses them instead of creating new ones.
Here's how you can do it:
evaluate
function using the embeddings
parameter.For example:
# Assuming `custom_embeddings` is an instance of your CustomHuggingFaceRagasEmbeddings
evaluation_report = evaluate(ragas_data, metrics=metrics, embeddings=custom_embeddings)
By providing the embeddings
parameter, the evaluate
function will use your custom embeddings and will not call the embedding_factory
function, thus avoiding the default OpenAI embeddings [1].
To continue talking to Dosu, mention @dosu.
I want to substitute the output of a function called embedded factory with my computed embeddings. because I already have the computed embeddings both for the answer and the question and also the context in a Ragas data frame. I have every number (float) needed for the computation. but ragas evaluate function does the computation again.
I create a subclass of baseragassembeddings. because I already have all the embeddings for context, query, and question. I did this to not use the openai API key. because it is costly and also I want to use other models like mistral or etc. The model that I used to create the embeddings is 'text-embedding-ada-002'. but the problem that I have been dealing with this error: Exception has occurred: OpenAIError The api_key client option must be set either by passing api_key to the client or by setting the OPENAI_API_KEY environment variable File "C:\Users\Amin\OneDrive - unige.it\Desktop\tirocini\code\version01rageva.py", line 165, in
evaluation_report = evaluate(ragas_data, metrics=metrics, embeddings=custom_embeddings)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
openai.OpenAIError: The api_key client option must be set either by passing api_key to the client or by setting the OPENAI_API_KEY environment variable.
as far as I remember I knew that I do not need for an API key. so please help me fix this error. I also put my code here for taking a look. note that I am using a model from Hugginface, and the model name is : model_name = 'distilbert-base-uncased'
import json from transformers import AutoModel, AutoTokenizer import torch import numpy as np from datasets import Dataset from ragas.embeddings import BaseRagasEmbeddings from ragas.metrics import context_utilization,ContextUtilization from ragas import evaluate
Load the ground truth data
file_path = 'assets\GT.json' with open(file_path) as f: ground_truth_data = json.load(f)
Load the question and the answer and the chunks
file_path = 'assets\user_llm_interaction_embeddings_c1521dd5_b819_4241_b3a4_3e5c1388037c.json' with open(file_path) as f: llm = json.load(f)
Initialize an empty list to hold the new dataset
data_set = []
Iterate through the list and combine every two dictionaries
for i in range(0, len(llm), 2): combined_dict = { "text_vector_1": llm[i].get("text_vector", []), "text_vector_2": llm[i + 1].get("text_vector", []), 'chunks': llm[i + 1].get('chunks', []) } data_set.append(combined_dict)
def map_chunks(data_set, ground_truth_data): for item in data_set: # Iterate over each dictionary in data_set c = [] # Reset c for each item for chunk_id in item['chunks']: # Loop through 'chunks' in the current dictionary for element in ground_truth_data: # Loop through ground_truth_data if element['id'] == chunk_id: # Match chunk_id with element's id c.append(element['text_vector']) # Append the matching text_vector to c item['chunks'] = c # Replace the original 'chunks' (ids) with the mapped text_vector values
data_set = map_chunks(data_set, ground_truth_data)
Assuming data_set is a list of dictionaries
ragas_data = [ { "question": entry["text_vector_1"], # Assuming this is a list of strings "answer": entry["text_vector_2"], # Assuming this is a list of strings "contexts": entry["chunks"] # Assuming this is a list of lists of strings } for entry in data_set ]
Create the required structure for Dataset
formatted_data = { "question": [entry["question"] for entry in ragas_data], "contexts": [entry["contexts"] for entry in ragas_data], "answer": [entry["answer"] for entry in ragas_data] }
model_name = 'distilbert-base-uncased'
class CustomHuggingFaceRagasEmbeddings(BaseRagasEmbeddings): def init(self, model_name: str, custom_embeddings: list = None): """ Initialize the Custom Hugging Face Ragas Embeddings with the specified model and custom embeddings.
Initialize the custom embeddings class
custom_embeddings = CustomHuggingFaceRagasEmbeddings(ragas_data)
ragas_embeddings = CustomHuggingFaceRagasEmbeddings(model_name=model_name, custom_embeddings=custom_embeddings)
Define the evaluation metrics
metrics = [context_utilization]
lets define a custom function of evaluate function
def custom_evaluate(ragas_data, metrics, embeddings: BaseRagasEmbeddings):
""" Custom evaluation function that avoids using OpenAI API.
results = {}
return results
Run the evaluation
evaluation_report = evaluate(ragas_data, metrics=metrics, embeddings=custom_embeddings)
Print the evaluation results
print("RAGAS Evaluation Report:") print(evaluation_report)