explodinggradients / ragas

Supercharge Your LLM Application Evaluations 🚀
https://docs.ragas.io
Apache License 2.0
7.16k stars 729 forks source link

Evaluate function error #803

Closed lalehsg closed 7 months ago

lalehsg commented 7 months ago

[ ] I have checked the documentation and related resources and couldn't resolve my bug.

Describe the bug Evaluate throw an error. I had no issue last week! I believe I am using correct column names. Would appreciate your help!

`AttributeError: 'dict' object has no attribute 'rename_columns'

---------------------------------------------------------------------------
AttributeError                            Traceback (most recent call last)
File <command-1197997617353352>, line 2
      1 from ragas import evaluate
----> 2 result = evaluate(fiqa_eval["baseline"], metrics=metrics, embeddings=dbx_embeddings, llm = chat_model)
      3 result

File /local_disk0/.ephemeral_nfs/envs/pythonEnv-5e4c1c3b-00b6-4dea-9167-94454942f6b6/lib/python3.10/site-packages/ragas/evaluation.py:143, in evaluate(dataset, metrics, llm, embeddings, callbacks, is_async, run_config, raise_exceptions, column_map)
    140     metrics = [answer_relevancy, context_precision, faithfulness, context_recall]
    142 # remap column names from the dataset
--> 143 dataset = remap_column_names(dataset, column_map)
    144 # validation
    145 dataset = handle_deprecated_ground_truths(dataset)

File /local_disk0/.ephemeral_nfs/envs/pythonEnv-5e4c1c3b-00b6-4dea-9167-94454942f6b6/lib/python3.10/site-packages/ragas/validation.py:19, in remap_column_names(dataset, column_map)
     14 """
     15 Remap the column names in case dataset uses different column names
     16 """
     18 inverse_column_map = {v: k for k, v in column_map.items()}
---> 19 return dataset.rename_columns(inverse_column_map)

AttributeError: 'dict' object has no attribute 'rename_columns'

Ragas version: ragas==0.1.5

Python version:

Code to Reproduce

from langchain_community.embeddings.databricks import DatabricksEmbeddings

dbx_embeddings = DatabricksEmbeddings(endpoint = "databricks-bge-large-en")

ds = Dataset.from_dict({"question": eval_questions, "answer": answers, "contexts":contexts, "ground_truth": ground_truth})

chat_model = ChatDatabricks(endpoint="siegpt-gpt-4-chat")

from ragas import evaluate
result = evaluate(fiqa_eval["baseline"], metrics=metrics, embeddings=dbx_embeddings, llm = chat_model)
result

Error trace

Expected behavior I expected this to go through with no issue.

Additional context Add any other context about the problem here.

lalehsg commented 7 months ago

asa far as i see in the code, column_map is optional. https://github.com/explodinggradients/ragas/blob/962d40d1f5f45fe57266ec35176329b07dfc0ae5/src/ragas/evaluation.py#L86

also, i believe I am sending correct column names already. so, no need to rename them https://github.com/explodinggradients/ragas/blob/962d40d1f5f45fe57266ec35176329b07dfc0ae5/src/ragas/evaluation.py#L112

lalehsg commented 7 months ago

shouldn't we have a value check here for the map and in case if it's empty, not call the remap_column_names function?

databinary commented 7 months ago

I have the same issue. @lalehsg, how did you resolve it?

lalehsg commented 7 months ago

the error was misleading. it was not the column names. i had changed my data source, and therefore, i think i had messed up the "contexts" format as a list of lists. after fixing that, it went back to normal. could this be your issue? even if your context is empty you should pass a [] for it.

brito-bernardo commented 4 months ago

I'm still getting this error and no idea why I'm getting that

JANHMS commented 3 months ago

Is anyone able to resolve this? I am getting the same issue.

emilycsj commented 3 months ago

I tried pass in an empty list but still getting the same error.... Is anyone able to resolve this?

jjmachan commented 3 months ago

@emilycsj @JANHMS @brito-bernardo @databinary were you able to resolve this?

from what I can understand from @lalehsg the missleading error message is the problem and the root cause is the Dataset format. We do have checks for that too but maybe its not catching this errors

can you guys check if the dataset maybe?

jjmachan commented 3 months ago

this is a small util function that might help

from datasets import Dataset, Features, Sequence, Value, ClassLabel, Array2D, Array3D, Array4D, Array5D, Audio, Image

def get_dataset_types(dataset):
    if not isinstance(dataset, Dataset):
        raise ValueError("Input must be a Dataset object")

    def get_feature_type(feature):
        if isinstance(feature, Value):
            return f"Value({feature.dtype})"
        elif isinstance(feature, ClassLabel):
            return "ClassLabel"
        elif isinstance(feature, (Array2D, Array3D, Array4D, Array5D)):
            return f"{feature.__class__.__name__}({feature.dtype})"
        elif isinstance(feature, Audio):
            return f"Audio({feature.sampling_rate}Hz)"
        elif isinstance(feature, Image):
            return "Image"
        elif isinstance(feature, Sequence):
            return f"Sequence({get_feature_type(feature.feature)})"
        elif isinstance(feature, dict):
            return {k: get_feature_type(v) for k, v in feature.items()}
        else:
            return str(type(feature))

    return {name: get_feature_type(feature) for name, feature in dataset.features.items()}

need you help to figure out the root cause and then we'll make the error messages better 🙂

DressPD commented 3 months ago

solution, transform the synthetic data in df and back to dataset:

testset = generator.generate_with_langchain_docs(documents, test_size=10, distributions={simple: 0.5, reasoning: 0.25, multi_context: 0.25})
testset = testset.to_pandas()

from datasets import Dataset

questions = testset["question"].tolist()
answers = []
contexts = []

# Inference
for query in questions:
  # Call the RAG pipeline
  response = rag_pipeline(query)
  # Extract and print only the result
  formatted_result = response['result']
  answers.append(formatted_result)
  contexts.append([docs.page_content for docs in retriever.get_relevant_documents(query)])

# To dict
data = {
    "question": testset["question"].tolist(),
    "answer": answers,
    "contexts": testset["contexts"].tolist(),
    "ground_truth": testset["ground_truth"].tolist()
}

# Convert dict to dataset
dataset = Dataset.from_dict(data)
print(dataset)

result = evaluate(
    dataset = dataset,
    metrics=[
        context_precision,
        context_recall,
        faithfulness,
        answer_relevancy,
    ],
    llm=llm_openai,
    embeddings=embeddings_openai,
    run_config=RunConfig(max_workers=5)
)

df_eval = result.to_pandas()

@jjmachan I suppose you should review generator.generate_with_langchain_docs as it is not creating the format expected by ragas evaluated method

brito-bernardo commented 3 months ago

Tbh, I've tried multiples dataset even trying to do manually, but it didn't work at all. Idk maybe I'll try this approach from @DressPD .

jjmachan commented 3 months ago

@DressPD there is actually a function called to_dataset() that could help you but I agree, there is some boilerplate code you have to write.

can I ask you something @DressPD @brito-bernardo @emilycsj @JANHMS @brito-bernardo @databinary

If you were to design the what the evaluate() metrics takes in, what would it have been.

Senthselvi commented 2 months ago

testset = generator.generate_with_langchain_docs(documents, test_size=10, distributions={simple: 0.5, reasoning: 0.25, multi_context: 0.25}) testset = testset.to_pandas() -- What code is used for llamaindex?