explodinggradients / ragas

Supercharge Your LLM Application Evaluations 🚀
https://docs.ragas.io
Apache License 2.0
6.93k stars 696 forks source link

get keyphrases from TestGenerator #1131

Open ableiweiss opened 2 months ago

ableiweiss commented 2 months ago

Is it get the keyphrases extracted by TestGenerator using the API?

dosubot[bot] commented 2 months ago

To get the keyphrases extracted by TestGenerator using the API, you can use the KeyphraseExtractor class and its extract method. Here is an example of how you can do this:

from ragas.testset.extractor import KeyphraseExtractor
from ragas.llms.base import BaseRagasLLM
from ragas.testset.docstore import Node

# Assuming you have an instance of BaseRagasLLM and a Node
llm = BaseRagasLLM()  # Initialize your LLM instance
node = Node(page_content="Your text content here")  # Initialize your Node with the text content

# Create an instance of KeyphraseExtractor
keyphrase_extractor = KeyphraseExtractor(llm=llm)

# Extract keyphrases
keyphrases = await keyphrase_extractor.extract(node=node, is_async=True)

print(keyphrases)

This code initializes the KeyphraseExtractor with an instance of BaseRagasLLM and a Node containing the text content. It then calls the extract method to get the keyphrases. The extract method returns a list of keyphrases extracted from the text content.

Additionally, the TestsetGenerator class provides a convenient way to initialize the KeyphraseExtractor when creating a document store. Here is an example of how you can use TestsetGenerator to set up the keyphrase extraction:

from ragas.testset.generator import TestsetGenerator
from ragas.llms import BaseRagasLLM
from ragas.embeddings.base import BaseRagasEmbeddings
from ragas.testset.docstore import InMemoryDocumentStore

# Initialize your LLM and embeddings instances
generator_llm = BaseRagasLLM()
critic_llm = BaseRagasLLM()
embeddings = BaseRagasEmbeddings()

# Create a TestsetGenerator instance
testset_generator = TestsetGenerator(
    generator_llm=generator_llm,
    critic_llm=critic_llm,
    embeddings=embeddings,
    docstore=InMemoryDocumentStore()
)

# Now you can use the testset_generator to handle keyphrase extraction

This approach ensures that the KeyphraseExtractor is properly integrated into the document store, making it easier to manage and use within your RAG pipeline [1][2][3].

To continue talking to Dosu, mention @dosu.

jjmachan commented 2 months ago

@ableiweiss that is an interesting suggestion. How are you planning to use it?

We are working on a rewamp for testset generation #1016 will add it to that but if you could explain your reasoning and usecase it will help us a lot 🙂