How to use the package without ChatGPT/OpenAI ?

nprime496 commented 1 month ago

[ ] I checked the documentation and related resources and couldn't find an answer to my question.

Your Question

Hi, I am trying to generate test samples using ragas. I followed the documentation for custom models but I get an error regarding ChatGPT.

I never instantiated OpenAI and don't want to. What is this model used for and how do I disable/replace it ?

Code Examples This community speaks code. Share your code snippets to help us understand your question better.

Here is the code :

from ragas.testset import TestsetGenerator
from langchain_ollama.llms import OllamaLLM
from langchain_community.embeddings.fastembed import FastEmbedEmbeddings
from langchain_community.document_loaders import PyMuPDFLoader
import os 

# generator with openai models
generator_llm = OllamaLLM(model="gemma2:2b")
critic_llm = OllamaLLM(model="gemma2:2b")

embeddings = FastEmbedEmbeddings()

pdf_path = os.path.join("documents", "sample.pdf")

loader = PyMuPDFLoader(pdf_path)

docs = loader.load()
generator = TestsetGenerator.from_langchain( generator_llm)#, generator_llm, embeddings )

dataset = generator.generate_with_langchain_docs(docs, test_size=10)

However, I get an error :

    raise OpenAIError(
openai.OpenAIError: The api_key client option must be set either by passing api_key to the client or by setting the OPENAI_API_KEY environment variable

Additional context

None.

dosubot[bot] commented 1 month ago

To use the Ragas package without relying on ChatGPT or OpenAI, you can replace the OpenAI model with another language model supported by the package. Here's how you can do it:

Ensure Correct Model Usage:
- Make sure you are not using any OpenAI-specific models or settings. In your code, you're using OllamaLLM, which should be fine, but ensure that no part of your code or environment is trying to use OpenAI.
Check Environment Variables:
- Ensure that the OPENAI_API_KEY environment variable is not set, as this might trigger the use of OpenAI models.
Modify the TestsetGenerator Initialization:
- Ensure that you are correctly passing all required parameters to TestsetGenerator.from_langchain. It seems like you commented out some parameters. Make sure to include them if needed:
```
generator = TestsetGenerator.from_langchain(
   generator_llm=generator_llm,
   critic_llm=critic_llm,
   embeddings=embeddings
)
```

By following these steps, you should be able to use the Ragas package without relying on OpenAI models ^[1]^[2].

To continue talking to Dosu, mention @dosu.

nprime496 commented 1 month ago

@dosu

 generator = TestsetGenerator.from_langchain( llm=generator_llm,critic_llm=generator_llm, embeddings=embeddings)#, document_loader=loader)
                ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
TypeError: TestsetGenerator.from_langchain() got an unexpected keyword argument 'critic_llm'

dosubot[bot] commented 1 month ago

The TestsetGenerator.from_langchain method does not accept a critic_llm parameter. The correct parameters are:

llm: LangchainLLM
knowledge_graph: t.Optional[KnowledgeGraph] = None

You should adjust your code to remove the critic_llm argument:

generator = TestsetGenerator.from_langchain(
    llm=generator_llm
    # Add other parameters if needed, but not critic_llm
)

Make sure to only use the parameters that are supported by the method ^[1].

To continue talking to Dosu, mention @dosu.

nprime496 commented 1 month ago

@dosu I reformulate my question.

The method generate_with_langchain_docs uses an OpenAI model by default through the transforms attribute, how to replace it with a local LLM ?

(method) def generate_with_langchain_docs(
    documents: Sequence[Document],
    test_size: int,
    transforms: Transforms | None = None,
    query_distribution: QueryDistribution | None = None,
    run_config: RunConfig | None = None,
    callbacks: Callbacks = None,
    with_debugging_logs: bool = False,
    raise_exceptions: bool = True
) -> Testset

dosubot[bot] commented 1 month ago

To replace the default OpenAI model in the generate_with_langchain_docs method's transforms attribute with a local LLM, you need to modify the llm_factory function. Here's how:

Create an Instance of Your Local LLM: Initialize your local LLM.
Wrap the Local LLM: Use the LangchainLLMWrapper to wrap your local LLM.
Return the Wrapped LLM: Modify the llm_factory function to return your wrapped local LLM.

Here's an example:

def llm_factory(
    model: str = "gpt-4o-mini",
    run_config: t.Optional[RunConfig] = None,
    default_headers: t.Optional[t.Dict[str, str]] = None,
    base_url: t.Optional[str] = None,
) -> BaseRagasLLM:
    timeout = None
    if run_config is not None:
        timeout = run_config.timeout

    # Initialize your local LLM here
    local_llm = YourLocalLLM()

    # Wrap the local LLM with the appropriate wrapper
    return LangchainLLMWrapper(local_llm, run_config)

Replace YourLocalLLM() with the actual initialization of your local LLM. Ensure your local LLM is a subclass of BaseRagasLLM as required by Ragas ^[1]^[2]^[3].

To continue talking to Dosu, mention @dosu.

nprime496 commented 1 month ago

@dosu I cannot change the code of the package, how do I use it from a user point of view ? How to craft a transform parameter such that I use the model that I want ?

dosubot[bot] commented 1 month ago

To use a specific local LLM model without modifying the package code, you can craft the transforms parameter by wrapping your custom LLM with LangchainLLMWrapper. Here's how:

Initialize Your Local LLM: Import and set up your local LLM.
Wrap the Local LLM: Use LangchainLLMWrapper to wrap your LLM.
Set the Transforms Parameter: Pass the wrapped LLM in the transforms parameter.

Here's an example:

from your_local_llm_module import YourLocalLLM  # Import your local LLM
from ragas.llms import LangchainLLMWrapper

# Initialize your local LLM
local_llm = YourLocalLLM(
    # Add any necessary initialization parameters here
)

# Wrap your local LLM with LangchainLLMWrapper
wrapped_local_llm = LangchainLLMWrapper(local_llm)

# Use the wrapped LLM in the transforms parameter
transforms = {
    'llm': wrapped_local_llm,
    # Add other necessary transforms if needed
}

# Use the transforms in your method call
dataset = generator.generate_with_langchain_docs(docs, test_size=10, transforms=transforms)

This approach allows you to specify the model you want to use without altering the package code ^[1].

To continue talking to Dosu, mention @dosu.

nprime496 commented 1 month ago

@dosu raise ValueError( ValueError: Invalid transforms type: <class 'dict'>. Expects a list of BaseGraphTransformations or a Parallel instance.

dosubot[bot] commented 1 month ago

To resolve the error, you need to provide the transforms parameter as a list of BaseGraphTransformation instances or a Parallel instance, not a dictionary. Here's how you can do it:

Create Custom Transformations: Define your custom transformations by subclassing BaseGraphTransformation.
Set the Transforms Parameter: Pass a list of these custom transformations or a Parallel instance containing them.

Example:

from ragas.testset.transforms.base import BaseGraphTransformation
from ragas.testset.transforms.engine import Parallel

class CustomTransformation(BaseGraphTransformation):
    def transform(self, knowledge_graph):
        # Implement your transformation logic here
        pass

    def generate_execution_plan(self, knowledge_graph):
        # Implement your execution plan here
        return []

# Create a list of transformations
custom_transforms = [CustomTransformation()]

# Alternatively, use Parallel to run transformations concurrently
parallel_transforms = Parallel(CustomTransformation())

# Use the transforms in your method call
dataset = generator.generate_with_langchain_docs(
    docs,
    test_size=10,
    transforms=custom_transforms  # or parallel_transforms
)

This setup ensures that your custom transformations are applied correctly ^[1]^[2].

To continue talking to Dosu, mention @dosu.

Spawnfile commented 1 month ago

Have you succesfully executed your test without openai ? @nprime496

nprime496 commented 1 month ago

Hi @Spawnfile no I couldn't.

I found this documentation about using a different critic model but the code does not work. I had to create another issue #1454

Also it's all about a model developped by exploding gradients so I wonder if a generic model can be used (this is what I want).

jjmachan commented 1 month ago

@Spawnfile @nprime496 appologies for the confusion - dosu was incorrect here

all you have to do is

from ragas.testset.generator import TestsetGenerator
from ragas.testset.evolutions import simple, reasoning, multi_context
from langchain_openai import ChatOpenAI, OpenAIEmbeddings

# generator with openai models
generator_llm = ChatOpenAI(model="gpt-3.5-turbo-16k")
critic_llm = ChatOpenAI(model="gpt-4")
embeddings = OpenAIEmbeddings()

generator = TestsetGenerator.from_langchain(
    generator_llm,
    critic_llm,
    embeddings
)

# generate testset
testset = generator.generate_with_langchain_docs(documents, test_size=10, distributions={simple: 0.5, reasoning: 0.25, multi_context: 0.25})

now edit the llms with which ever you want - in your case you only changed one llm so Ragas will use the default llm for critic

docs: https://docs.ragas.io/en/stable/getstarted/testset_generation.html https://docs.ragas.io/en/stable/howtos/customisations/bring-your-own-llm-or-embs.html

could you tell let me know if it works? feel free to ask any further questions here 🙂

nprime496 commented 1 month ago

Yes, it was my bad. I was using the wrong version of ragas.

nprime496 commented 1 month ago

@jjmachan Hi, please I have a hard requirement to use langchain>0.3.

I am trying to use the v0.2beta how to modify this code to make it work ?

generator = TestsetGenerator.from_langchain(
    generator_llm,
    critic_llm,
    embeddings
)

returns

  generator = TestsetGenerator.from_langchain(
                ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
TypeError: TestsetGenerator.from_langchain() takes from 2 to 3 positional arguments but 4 were given

PS:

I tried to look into the code but I don't understand it, it can't find the functionality to tune the critic model. This version of the code

from ragas.testset import TestsetGenerator
#from ragas.testset.evolutions import simple, reasoning, multi_context
from langchain_ollama.llms import OllamaLLM
from langchain_community.embeddings.fastembed import FastEmbedEmbeddings
from langchain_community.document_loaders import PyMuPDFLoader
import os 

generator_llm = OllamaLLM(model="gemma2:2b")
critic_llm = OllamaLLM(model="gemma2:2b")
embeddings = FastEmbedEmbeddings()

generator = TestsetGenerator.from_langchain(
    generator_llm,
    # critic_llm,
    # embeddings
)
pdf_path = os.path.join("documents", "sample.pdf")

loader = PyMuPDFLoader(pdf_path)

docs = loader.load()
# generate testset
#  distributions={simple: 0.5, reasoning: 0.25, multi_context: 0.25}
testset = generator.generate_with_langchain_docs(docs, test_size=10)

returns

raise OpenAIError(
openai.OpenAIError: The api_key client option must be set either by passing api_key to the client or by setting the OPENAI_API_KEY environment variable

with an error stacktrace starting at :

  transforms = transforms or default_transforms()

meaning that it still tries to use GPT4 but in a more complex way with "transformations".

jjmachan commented 1 month ago

@nprime496 take a look at this in that case https://docs.ragas.io/en/latest/getstarted/rag_testset_generation/

nprime496 commented 1 month ago

Hi @jjmachan , I followed the documentation. Here is my code, in accordance to https://docs.ragas.io/en/latest/getstarted/rag_testset_generation/

from langchain_ollama import ChatOllama
from langchain_community.document_loaders import PyMuPDFLoader
import os 

from ragas.llms import LangchainLLMWrapper
from ragas.testset import TestsetGenerator

generator_llm = LangchainLLMWrapper(ChatOllama(model="gemma2:2b"))
pdf_path = os.path.join("documents", "sample.pdf")

loader = PyMuPDFLoader(pdf_path)

docs = loader.load()

generator = TestsetGenerator(llm=generator_llm)
dataset = generator.generate_with_langchain_docs(docs, test_size=10)

dataset.to_pandas().to_csv("testset.csv", index=False)

I get

    raise OpenAIError(
openai.OpenAIError: The api_key client option must be set either by passing api_key to the client or by setting the OPENAI_API_KEY environment variable

jjmachan commented 1 month ago

@nprime496 this is because of the embeddings in the transforms are defaulting to OpenAI - thanks a lot for reporting this - I will get this fixed shortly 🙂

frr163 commented 4 weeks ago

Hello, is there any new progress? @jjmachan

jjmachan commented 3 weeks ago

@frr163 this has been fixed already

do check out the docs here https://docs.ragas.io/en/latest/getstarted/rag_evaluation/#choosing-evaluator-llm

explodinggradients / ragas

How to use the package without ChatGPT/OpenAI ? #1448