I'm just trying to generate a simple test dataset using Ragas and facing issues where I keep getting the IndexError: list index out of range. I want to understand if I'm making any fundamental mistake as usually I have my own custom workflow to generate these datasets but thought of exploring Ragas's features for this recently besides just evaluation
Ragas version: 0.2.3
Python version: 3.10
Code
from ragas.llms import LangchainLLMWrapper
from ragas.embeddings import LangchainEmbeddingsWrapper
from langchain_openai import ChatOpenAI
from langchain_openai import OpenAIEmbeddings
generator_llm = LangchainLLMWrapper(ChatOpenAI(model="gpt-4o"))
generator_embeddings = LangchainEmbeddingsWrapper(OpenAIEmbeddings())
from ragas.testset import TestsetGenerator
generator = TestsetGenerator(llm=generator_llm,
embedding_model=generator_embeddings)
eval_dataset = generator.generate_with_langchain_docs(processed_docs[:3],
testset_size=3)
and processed_docs[:3] is just some simple documents as follows
[Document(metadata={'title': 'Machine Learning', 'id': 1}, page_content='Machine learning is a field of artificial intelligence focused on enabling systems to learn patterns from data. Algorithms analyze past data to make predictions or classify information. Popular applications include recommendation systems and image recognition.'),
Document(metadata={'title': 'Deep Learning', 'id': 2}, page_content='Deep learning is a subset of machine learning utilizing neural networks with many layers. It excels in complex tasks like image and speech recognition. Convolutional and recurrent neural networks are among the common architectures used.'),
Document(metadata={'title': 'Natural Language Processing (NLP)', 'id': 3}, page_content='NLP is a branch of AI that enables computers to understand, interpret, and generate human language. Techniques include tokenization, stemming, and sentiment analysis. Applications range from chatbots to language translation services.')]
Traceback
Generating Scenarios:   0%
 0/3 [00:00<?, ?it/s]
---------------------------------------------------------------------------
IndexError Traceback (most recent call last)
[<ipython-input-34-5896c6d0aa07>](https://localhost:8080/#) in <cell line: 5>()
3 generator = TestsetGenerator(llm=generator_llm,
4 embedding_model=generator_embeddings)
----> 5 eval_dataset = generator.generate_with_langchain_docs(processed_docs[:3],
6 testset_size=3)
18 frames
[/usr/lib/python3.10/random.py](https://localhost:8080/#) in <listcomp>(.0)
517 floor = _floor
518 n += 0.0 # convert to float for a small speed improvement
--> 519 return [population[floor(random() * n)] for i in _repeat(None, k)]
520 try:
521 cum_weights = list(_accumulate(weights))
IndexError: list index out of range
Would appreciate it if I can get some insights on if I'm fundamentally using this feature wrong or there is a deeper issue here.
Issue
I'm just trying to generate a simple test dataset using Ragas and facing issues where I keep getting the IndexError: list index out of range. I want to understand if I'm making any fundamental mistake as usually I have my own custom workflow to generate these datasets but thought of exploring Ragas's features for this recently besides just evaluation
Ragas version: 0.2.3 Python version: 3.10
Code
and
processed_docs[:3]
is just some simple documents as followsTraceback
Would appreciate it if I can get some insights on if I'm fundamentally using this feature wrong or there is a deeper issue here.