explodinggradients / ragas

Supercharge Your LLM Application Evaluations 🚀
https://docs.ragas.io
Apache License 2.0
7.16k stars 728 forks source link

Subject: Issue with Agentic AI Metrics (Topic Adherence and Agent Goal Accuracy) #1633

Open Nari995 opened 2 days ago

Nari995 commented 2 days ago

[ ] I checked the documentation and related resources and couldn't find an answer to my question.

Hello,

We are currently encountering an issue when running Agentic metrics, specifically related to Topic Adherence and Agent Goal Accuracy. The error message we are receiving is: This error is with version v0.2.3

TypeError: object of type 'StringPromptValue' has no len()

We tried with other versions as well then encountering below errors:

with version v0.2.1: TypeError: Can't instantiate abstract class AgentGoalAccuracyWithReference with abstract method _ascore

with version v0.2.2: TypeError: Can't instantiate abstract class AgentGoalAccuracyWithReference with abstract method _ascore

we're unable to proceed with the metrics evaluation. Could you please assist us in resolving this? Any insights or guidance on how to address this error would be greatly appreciated.

Thank you in advance for your support!

sahusiddharth commented 2 days ago

Hi @Nari995, I recently encountered and solved a similar error, but I'm unable to pinpoint exactly where it occurred in my code. Could you please share the code snippet you're working with, along with the full error message? That would help me understand the issue better and assist you more effectively.

Nari995 commented 1 day ago

Hi @sahusiddharth, please find below required details

Code snippet

from ragas.dataset_schema import  MultiTurnSample
from ragas.messages import HumanMessage,AIMessage,ToolMessage,ToolCall
from ragas.metrics import AgentGoalAccuracyWithReference

async def main():
    sample = MultiTurnSample(user_input=[
        HumanMessage(content="Hey, book a table at the nearest best Chinese restaurant for 8:00pm"),
        AIMessage(content="Sure, let me find the best options for you.", tool_calls=[
            ToolCall(name="restaurant_search", args={"cuisine": "Chinese", "time": "8:00pm"})
        ]),
        ToolMessage(content="Found a few options: 1. Golden Dragon, 2. Jade Palace"),
        AIMessage(content="I found some great options: Golden Dragon and Jade Palace. Which one would you prefer?"),
        HumanMessage(content="Let's go with Golden Dragon."),
        AIMessage(content="Great choice! I'll book a table for 8:00pm at Golden Dragon.", tool_calls=[
            ToolCall(name="restaurant_book", args={"name": "Golden Dragon", "time": "8:00pm"})
        ]),
        ToolMessage(content="Table booked at Golden Dragon for 8:00pm."),
        AIMessage(content="Your table at Golden Dragon is booked for 8:00pm. Enjoy your meal!"),
        HumanMessage(content="thanks"),
    ],
    reference="Table booked at one of the chinese restaurants at 8 pm")

    scorer = AgentGoalAccuracyWithReference()
    scorer.llm = azure_model
    score = await scorer.multi_turn_ascore(sample)
    print(score)

# To run the async function, you need an event loop
import asyncio

if __name__ == "__main__":
    asyncio.run(main())

Error message

Traceback (most recent call last):
  File "c:\*****\Desktop\VS\agenticai.py", line 145, in <module>
    asyncio.run(main())
  File "C:\*****\AppData\Local\Programs\Python\Python311\Lib\asyncio\runners.py", line 190, in run
    return runner.run(main)
           ^^^^^^^^^^^^^^^^
  File "C:\*****\AppData\Local\Programs\Python\Python311\Lib\asyncio\runners.py", line 118, in run
    return self._loop.run_until_complete(task)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "C:\*****\AppData\Local\Programs\Python\Python311\Lib\asyncio\base_events.py", line 653, in run_until_complete
    return future.result()
           ^^^^^^^^^^^^^^^
  File "c:\******\Desktop\VS\agenticai.py", line 138, in main
    score = await scorer.multi_turn_ascore(sample)
            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "C:\******\AppData\Local\Programs\Python\Python311\Lib\site-packages\ragas\metrics\base.py", line 393, in multi_turn_ascore
    raise e
  File "C:\******\AppData\Local\Programs\Python\Python311\Lib\site-packages\ragas\metrics\base.py", line 386, in multi_turn_ascore
    score = await asyncio.wait_for(
            ^^^^^^^^^^^^^^^^^^^^^^^
  File "C:\*******\AppData\Local\Programs\Python\Python311\Lib\asyncio\tasks.py", line 442, in wait_for
    return await fut
           ^^^^^^^^^
  File "C:\******\AppData\Local\Programs\Python\Python311\Lib\site-packages\ragas\metrics\_goal_accuracy.py", line 129, in _multi_turn_ascore
    response = await self.workflow_prompt.generate(
               ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "C:\******\AppData\Local\Programs\Python\Python311\Lib\site-packages\ragas\prompt\pydantic_prompt.py", line 130, in generate
    output_single = await self.generate_multiple(
                    ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "C:\******\AppData\Local\Programs\Python\Python311\Lib\site-packages\ragas\prompt\pydantic_prompt.py", line 190, in generate_multiple
    resp = await llm.generate(
                 ^^^^^^^^^^^^^
  File "C:\******\AppData\Local\Programs\Python\Python311\Lib\site-packages\langchain_core\language_models\chat_models.py", line 544, in generate       
    batch_size=len(messages),
               ^^^^^^^^^^^^^
TypeError: object of type 'StringPromptValue' has no len()

Please let me know if any other information needed.

jjmachan commented 1 day ago

hey @Nari995 what is the type for azure_model? is it wrapped? do you face this problem everytime?

Nari995 commented 1 day ago

Hi @jjmachan, azure_model is an instance of the AzureChatOpenAI class. Please find below snippet for reference

azure_model = AzureChatOpenAI( openai_api_version="2024-02-15-preview", azure_endpoint=azure_configs["base_url"], azure_deployment=azure_configs["model_deployment"], model=azure_configs["model_name"], validate_base_url=False, )

we don't face this problem every time. Never faced it when we use other ragas metrics (AnswerCorrectness, AnswerRelevancy, Faithfulness etc), Facing it only with Agentic metrics.

sahusiddharth commented 1 day ago

@Nari995, I think the issue might be caused by a mix-up between Ragas messages and LangChain messages. You can try to fix this by explicitly using the message types from Ragas.

import ragas.messages as r

# Use Ragas message types directly
messages = [
    r.HumanMessage(content="Your message"),
    r.AIMessage(content="AI response"),
    r.ToolMessage(content="Tool output")
]

I’m not 100% sure, but this might help clear up the confusion. Please give it a try and let me know how it goes!

Nari995 commented 1 day ago

hello @sahusiddharth,

Still, we are getting the same error.

TypeError: object of type 'StringPromptValue' has no len()

any other suggestions?

sahusiddharth commented 1 day ago

@Nari995, try

from ragas.llms import LangchainLLMWrapper
.
.
.
scorer.llm = LangchainLLMWrapper(azure_model)

Have a look at this colab notebook

Nari995 commented 19 hours ago

Hi @sahusiddharth, it is working fine with above suggestion. we are able to move ahead in metrics evaluation. Thank you so much for your assistance.