explodinggradients / ragas

Evaluation framework for your Retrieval Augmented Generation (RAG) pipelines
https://docs.ragas.io
Apache License 2.0
6.32k stars 606 forks source link

Too many calls to LLM for generation of "test_size=1" and distribution simple = 1 #741

Closed mcapitanio closed 2 months ago

mcapitanio commented 5 months ago

I am trying to generate syntetic data using Azure Open AI in a simple case:

test_dataset = generator.generate_with_langchain_docs(
            documents, test_size=1, with_debugging_logs=True, distributions={simple: 1})

I have configured the Azure LLM for generator and critic as suggested in the documentation. When the generation start I see in the log these:

INFO:httpx:HTTP Request: POST https://*****//openai/deployments/gpt-4-turbo/chat/completions?api-version=2023-10-01-preview "HTTP/1.1 200 OK"
[ragas.testset.filters.DEBUG] node filter: {'score': 4.0}
[ragas.testset.evolutions.INFO] retrying evolution: 0 times
INFO:httpx:HTTP Request: POST https://*****//openai/deployments/gpt-4-turbo/chat/completions?api-version=2023-10-01-preview "HTTP/1.1 200 OK"
[ragas.testset.filters.DEBUG] node filter: {'score': 4.0}
[ragas.testset.evolutions.INFO] retrying evolution: 0 times
INFO:httpx:HTTP Request: POST https://*****//openai/deployments/gpt-4-turbo/chat/completions?api-version=2023-10-01-preview "HTTP/1.1 200 OK"
[ragas.testset.filters.DEBUG] node filter: {'score': 4.0}
[ragas.testset.evolutions.INFO] retrying evolution: 0 times
INFO:httpx:HTTP Request: POST https://*****//openai/deployments/gpt-4-turbo/chat/completions?api-version=2023-10-01-preview "HTTP/1.1 200 OK"
[ragas.testset.filters.DEBUG] node filter: {'score': 4.0}
[ragas.testset.evolutions.INFO] retrying evolution: 0 times
INFO:httpx:HTTP Request: POST https://*****//openai/deployments/gpt-4-turbo/chat/completions?api-version=2023-10-01-preview "HTTP/1.1 200 OK"
[ragas.testset.filters.DEBUG] node filter: {'score': 4.0}
[ragas.testset.evolutions.INFO] retrying evolution: 0 times
INFO:httpx:HTTP Request: POST https://*****//openai/deployments/gpt-4-turbo/chat/completions?api-version=2023-10-01-preview "HTTP/1.1 200 OK"
[ragas.testset.filters.DEBUG] node filter: {'score': 4.0}
[ragas.testset.evolutions.INFO] retrying evolution: 0 times
INFO:httpx:HTTP Request: POST https://*****//openai/deployments/gpt-4-turbo/chat/completions?api-version=2023-10-01-preview "HTTP/1.1 200 OK"
[ragas.testset.filters.DEBUG] node filter: {'score': 4.0}
[ragas.testset.evolutions.INFO] retrying evolution: 0 times
INFO:httpx:HTTP Request: POST https://*****//openai/deployments/gpt-4-turbo/chat/completions?api-version=2023-10-01-preview "HTTP/1.1 200 OK"
[ragas.testset.filters.DEBUG] node filter: {'score': 4.0}
[ragas.testset.evolutions.INFO] retrying evolution: 0 times
INFO:httpx:HTTP Request: POST https://*****//openai/deployments/gpt-4-turbo/chat/completions?api-version=2023-10-01-preview "HTTP/1.1 200 OK"
[ragas.testset.filters.DEBUG] node filter: {'score': 4.0}
[ragas.testset.evolutions.INFO] retrying evolution: 0 times
INFO:httpx:HTTP Request: POST https://*****//openai/deployments/gpt-4-turbo/chat/completions?api-version=2023-10-01-preview "HTTP/1.1 200 OK"

and it continue till it reach the call limint getting 429, waiting for the retry, resuming with other 200 code and then 429 again, and so on. It seems never stopping.

Why this behaviour? Any idea? How many call to LLM is expected for a test size of N with M documents? Is there any rule available to exstimate the cost?

shahules786 commented 5 months ago

Hi @mcapitanio A little more context here would help. I can see here the the your chunks are rater lower than required threshold. Are you using test generation in language other than English? If yes checkout https://docs.ragas.io/en/stable/howtos/applications/use_prompt_adaptation.html#language-adaptation-for-testset-generation Or I would love see the type of documents you're feeding into the library.

mcapitanio commented 5 months ago

Hi @shahules786 ,

yes, I am using test generation for italian, this is my code:

def generate(args,
             langchain_generation_llm: LangchainLLMWrapper,
             langchain_critic_llm: LangchainLLMWrapper,
             langchain_embeddings: LangchainEmbeddingsWrapper):

    try:
        loader = DirectoryLoader(path=args.documents_path, show_progress=True)

        documents = loader.load()
        for document in documents:
            splitted = document.metadata['source'].split("/")
            document.metadata['file_name'] = splitted[len(splitted) - 1]

        splitter = TokenTextSplitter(chunk_size=1000, chunk_overlap=100)
        keyphrase_extractor = KeyphraseExtractor(llm=langchain_generation_llm)
        docstore = InMemoryDocumentStore(
            splitter=splitter,
            embeddings=langchain_embeddings,
            extractor=keyphrase_extractor)

        generator = TestsetGenerator(generator_llm=langchain_generation_llm,
                                     critic_llm=langchain_critic_llm,
                                     docstore=docstore,
                                     embeddings=langchain_embeddings)

        generator.adapt(language="italian", evolutions=[
                        simple, reasoning, conditional, multi_context], cache_dir=args.cache_path)
        generator.save(evolutions=[
                       simple, reasoning, multi_context, conditional], cache_dir=args.cache_path)

        test_dataset = generator.generate_with_langchain_docs(
            documents, test_size=args.test_size, with_debugging_logs=True, distributions={simple: 1})
        test_dataset.save("test_dataset.jsonl")

    except Exception as e:
        print(e)