explodinggradients / ragas

Evaluation framework for your Retrieval Augmented Generation (RAG) pipelines
https://docs.ragas.io
Apache License 2.0
6.55k stars 643 forks source link

Inconsistency between data generation and evaluation #290

Closed brunopistone closed 5 months ago

brunopistone commented 10 months ago

Describe the bug I'm following the documentation for creating a synthetic dataset here. The generated dataset contains the following columns: "['question', 'context', 'answer', 'question_type', 'episode_done']".

The evaluation is requires:

  1. a column "contexts", which is not present in the dataset generated by following your guide
  2. the column "contexts" must be "Sequence[string]". By following your guide, the column contexts is a String

I'm expecting that your library will provide all the steps for both data generation and evaluation, but seems that it's quite inconsistent.

Ragas version: 0.0.20 Python version: 3.10

Code to Reproduce Just follow your documentation

Error trace ValueError: Dataset feature "contexts" should be of type Sequence[string], got <class 'datasets.features.features.Value'>

Expected behavior The expected behaviour is that:

  1. Synthetic data generator will produce a dataset that can be used for the evaluation
  2. Evaluation should follow what is produced in the step 1
shahules786 commented 10 months ago

Hey @brunopistone, thanks for bringing this to our notice. We will raise a fix ASAP, but feel free to raise a PR if you like to contribute :)

brunopistone commented 10 months ago

Hello @shahules786 , opened the PR #300 which contains the fixes + added compatibility to Amazon API Gateway

Antupis commented 9 months ago

What is situation with this now at least testset generation returns

DataRow = namedtuple(
    "DataRow",
    [
        "question",
        "ground_truth_context",
        "ground_truth",
        "question_type",
        "episode_done",
    ],
)

which does not match columns that Metrics want.

pankssid commented 7 months ago

Its working ...

`from datasets import Dataset, Value, Sequence

Assuming your CSV file is named 'your_data.csv'

csv_file_path = 'eval.csv'

Read the CSV file into a Pandas DataFrame

import pandas as pd df = pd.read_csv(csv_file_path)

df['contexts'] = df['contexts'].apply(lambda x: eval(x) if isinstance(x, str) else [])

Create a dictionary with the specified features

data_dict = { 'question': df['question'].tolist(),

'ground_truths': df['ground_truths'].tolist(),

'contexts': df['contexts'].tolist(),
'answer': df['answer'].tolist(),
'evolution_type': df['evolution_type'].tolist(),
'episode_done': df['episode_done'].tolist()

}

Create a Dataset using the dictionary

fiqa_dataset = Dataset.from_dict(data_dict)

Print the information about the created dataset

print(fiqa_dataset) `

stepkurniawan commented 6 months ago

I tried @pankssid way by converting my own DF to dict and then Dataset. It's still not working.

my dict:

{'question': ['What is the purpose of Network Analysis?'],
 'ground_truths': ['Network Analysis is conducted to understand connections and distances between data points by arranging data in a network structure.'],
 'contexts': [["list of potentially relevant individuals. In the ''free list'' approach, they are asked to recall individuals without seeing a list (Butts 2008, p.20f).\\n* '''Data Analysis''': When it comes to analyzing the gathered data, there are different network properties that researchers are interested in in accordance with their research questions. The analysis may be qualitative as well as quantitative, focusing either on the structure and quality of connections or on their quantity and values. (Marin & Wellman 2010, p.16; Butts 2008, p.21f). The analysis can focus on \\n** the quantity and quality of ties that connect to individual nodes\\n** the similarity between different nodes, or\\n** the structure of the network as a whole in terms of density, average connection length and strength or network composition.\\n* An important element of the analysis is not just the creation of quantitative or qualitative insights, but also the '''visual representation''' of the network. For",
   'Analysis gained even more traction through the increasing application in fields such as geography, economics and linguistics. Sociologists engaging with Social Network Analysis remained to come from different fields and topical backgrounds after that. Two major research areas today are community studies and interorganisational relations (Scott 1988; Borgatti et al. 2009). However, since Social Network Analysis allows to assess many kinds of complex interaction between entities, it has also come to use in fields such as ecology to identify and analyze trophic networks, in computer science, as well as in epidemiology (Stattner & Vidot 2011, p.8).\\n\\n\\n== What the method does ==\\n"Social network analysis is neither a theory nor a methodology. Rather, it is a perspective or a paradigm." (Marin & Wellman 2010, p.17) It subsumes a broad variety of methodological approaches; the fundamental ideas will be presented hereinafter.\\n\\nSocial Network Analysis is based on',
   'style="width: 33%"| \\\'\\\'\\\'[[:Category:Past|Past]]\\\'\\\'\\\' || style="width: 33%"| \\\'\\\'\\\'[[:Category:Present|Present]]\\\'\\\'\\\' || [[:Category:Future|Future]]\\n|}\\n<br/>__NOTOC__\\n<br/>\\n\\n\\\'\\\'\\\'In short:\\\'\\\'\\\' Social Network Analysis visualises social interactions as a network and analyzes the quality and quantity of connections and structures within this network.\\n\\n== Background ==\\n[[File:Scopus Results Social Network Analysis.png|400px|thumb|right|\\\'\\\'\\\'SCOPUS hits per year for Social Network Analysis until 2019.\\\'\\\'\\\' Search terms: \\\'Social Network Analysis\\\' in Title, Abstract, Keywords. Source: own.]]\\n\\n\\\'\\\'\\\'One of the originators of Network Analysis was German philosopher and sociologist Georg Simmel\\\'\\\'\\\'. His work around the year 1900 highlighted the importance of social relations when understanding social systems, rather than focusing on individual units. He argued "against understanding society as a mass of individuals who']],
 'answer': [' Network Analysis is a method to visual and analyze social interactions, such as connections between individuals, to answer research questions in fields such as sociology, ecology, computer science, and more. It can be qualitative or quantitative, and can focus on the structure, quality, or quantity of connections. It can also be visual, to make the network and connections more understanding.']}

Still got error :

ValueError: Dataset feature "ground_truths" should be of type Sequence[string], got <class 'datasets.features.features.Value'>

mabelli96 commented 6 months ago

@stepkurniawan "ground_truth" is expecting a list of strings as data types, similar to "contexts". I believe you need to convert your 'ground_truths' column in DF to a list object. Something like DF['ground_truths'] = DF['ground_truths'].apply(lambda x: [x]). Then it will work like a charm.

koshyviv commented 5 months ago

I thought i was lazying out of reading some documentation - since it mentions synthetic data generation and the next step mentions evaluation but does not refer to the generated document.

Can we consider moving Generate a Synthetic Test Set from the Get Started page to lets say the core concepts until this is fixed? I guess more folks may get confused when reading - it feels like the evaluation is a continuation of the synthetic generation.

mtharrison commented 5 months ago

Just commenting that I found the same thing confusing 👆

shahules786 commented 5 months ago

Hey @mtharrison let me go through this thread today and raise a fix by EOD.

shahules786 commented 5 months ago

Bros @mtharrison @koshyviv @stepkurniawan @brunopistone Does this image make sense for you? Is it good enough to add to docs (my drawing skills are subpar as you can see , haha)

Screenshot 2024-03-27 at 10 32 12 PM
mtharrison commented 5 months ago

@shahules786 this image does very much answer the questions that I had yes! Thank you.

I took at look at your PR too, I think it's an improvement however I would have a couple of other points:

I think having contexts but not answer in the generated test set is what caused confusion for me.

Also I think if you were to add to that diagram that the question and ground_truth could alternatively be manually written (or supplemented) by people with domain knowledge of your data it would really explain 100% what this library does 😄

Something like this:

flow ragas.excalidraw.zip

koshyviv commented 5 months ago

Bros @mtharrison @koshyviv @stepkurniawan @brunopistone Does this image make sense for you? Is it good enough to add to docs (my drawing skills are subpar as you can see , haha)

Your image skills are great! 😄

Just want to add to discussion, that from a newcomers perspective - the execution flow can still be improved. For e.g., i run the generate test cases script - gives output as CSV file. Prepare running the evaluation script - (but here) looks for a Dataset template.

I think if we have a flow as below (which i feel was the primary motivation behind the issue), users may be able to realize the benefits of the library quite quickly:

  1. Install ragas
  2. Create test csv using the codes/script
  3. Evaluate the pipeline using the output csv of above

I'm not sure if am making complete sense, but the hurdle between disconnected inputs/outputs of the Test generation/evaluation was the primary concern for me.

Currently, I'm using @pankssid and others version to convert the csv into a Dataset

def get_file_dataset(path="test.csv"):
    df = pd.read_csv(path)
    df['contexts'] = df['contexts'].apply(lambda x: eval(x) if isinstance(x, str) else [])
    data_dict = {
        'question': df['question'].tolist(),
        'ground_truth': df['ground_truth'].astype(str).tolist(),
        'contexts': df['contexts'].tolist(),
        'evolution_type': df['evolution_type'].tolist(),
        'episode_done': df['episode_done'].tolist()
    }
    custom_dataset = Dataset.from_dict(data_dict)
    return custom_dataset, data_dict

def generate_responses(test_questions, test_answers):
    answers = []
    contexts = []
    for q in test_questions:
        res = rag.ask_chain(q)
        answer = res["response"]
        context = res["docs"]
        print(f"Question: {q}\n Answer: {answer}")
        answers.append(answer)
        contexts.append(context)
    dataset_dict = {
        "question": test_questions,
        "answer": answers,
        "contexts": contexts,
    }
    if test_answers is not None:
        dataset_dict["ground_truth"] = test_answers
    ds = Dataset.from_dict(dataset_dict)
    return ds

def get_dataset():
    _, ddict = get_file_dataset()
    ds = generate_responses(ddict['question'],ddict['ground_truth'])
    return ds

result = evaluate(
    get_dataset(),
    metrics=[
        answer_relevancy,
        faithfulness,
        context_precision,
        context_recall,
    ],
    llm=ChatOpenAI(model="gpt-3.5-turbo"), 
    embeddings=langchain_embeddings
)
mtharrison commented 5 months ago

@koshyviv you can call to_dataset() on the generation output to get a dataset:

testset = generator.generate_with_langchain_docs(data, test_size=50, distributions={simple: 0.5, reasoning: 0.25, multi_context: 0.25})

dataset = testset.to_dataset()
koshyviv commented 5 months ago

Thanks! I'm still exploring the library and this helps. This could be a good addition to the Getting Started sections

shahules786 commented 5 months ago

Hey @mtharrison @koshyviv thanks for your input. I am with you on this 100% @mtharrison May I use your image for ragas docs?

mtharrison commented 5 months ago

@shahules786 sure thing!

shahules786 commented 5 months ago

Guys I just updated the docs with changes - hope that helps. If not feel free to reopen the issue. https://docs.ragas.io/en/latest/