Open ableiweiss opened 2 months ago
To get the keyphrases extracted by TestGenerator
using the API, you can use the KeyphraseExtractor
class and its extract
method. Here is an example of how you can do this:
from ragas.testset.extractor import KeyphraseExtractor
from ragas.llms.base import BaseRagasLLM
from ragas.testset.docstore import Node
# Assuming you have an instance of BaseRagasLLM and a Node
llm = BaseRagasLLM() # Initialize your LLM instance
node = Node(page_content="Your text content here") # Initialize your Node with the text content
# Create an instance of KeyphraseExtractor
keyphrase_extractor = KeyphraseExtractor(llm=llm)
# Extract keyphrases
keyphrases = await keyphrase_extractor.extract(node=node, is_async=True)
print(keyphrases)
This code initializes the KeyphraseExtractor
with an instance of BaseRagasLLM
and a Node
containing the text content. It then calls the extract
method to get the keyphrases. The extract
method returns a list of keyphrases extracted from the text content.
Additionally, the TestsetGenerator
class provides a convenient way to initialize the KeyphraseExtractor
when creating a document store. Here is an example of how you can use TestsetGenerator
to set up the keyphrase extraction:
from ragas.testset.generator import TestsetGenerator
from ragas.llms import BaseRagasLLM
from ragas.embeddings.base import BaseRagasEmbeddings
from ragas.testset.docstore import InMemoryDocumentStore
# Initialize your LLM and embeddings instances
generator_llm = BaseRagasLLM()
critic_llm = BaseRagasLLM()
embeddings = BaseRagasEmbeddings()
# Create a TestsetGenerator instance
testset_generator = TestsetGenerator(
generator_llm=generator_llm,
critic_llm=critic_llm,
embeddings=embeddings,
docstore=InMemoryDocumentStore()
)
# Now you can use the testset_generator to handle keyphrase extraction
This approach ensures that the KeyphraseExtractor
is properly integrated into the document store, making it easier to manage and use within your RAG pipeline [1][2][3].
To continue talking to Dosu, mention @dosu.
@ableiweiss that is an interesting suggestion. How are you planning to use it?
We are working on a rewamp for testset generation #1016 will add it to that but if you could explain your reasoning and usecase it will help us a lot 🙂
Is it get the keyphrases extracted by TestGenerator using the API?