explodinggradients / ragas

Supercharge Your LLM Application Evaluations 🚀
https://docs.ragas.io
Apache License 2.0
7.43k stars 753 forks source link

ValidationError: 1 validation error for ChatMessage role Input should be a valid string [type=string_type, input_value=None, input_type=NoneType] #1544

Closed miaoxiaolv closed 3 weeks ago

miaoxiaolv commented 1 month ago

[ ] I checked the documentation and related resources and couldn't find an answer to my question.

Your Question Requesting help. I started running ragas last week and encountered errors. Later, when I used the example from the ragas documentation at "https://docs.ragas.io/en/stable/getstarted/rag_evaluation/" for evaluation testing, I encountered the same error. The error is as follows:

/usr/local/lib/python3.10/dist-packages/huggingface_hub/utils/_token.py:89: UserWarning: 
The secret `HF_TOKEN` does not exist in your Colab secrets.
To authenticate with the Hugging Face Hub, create a token in your settings tab (https://huggingface.co/settings/tokens), set it as secret in your Google Colab and restart your session.
You will be able to reuse this secret in all of your notebooks.
Please note that authentication is recommended but still optional to access public models or datasets.
  warnings.warn(
amnesty_qa.py: 100%
 6.82k/6.82k [00:00<00:00, 343kB/s]
README.md: 100%
 1.90k/1.90k [00:00<00:00, 127kB/s]
The repository for explodinggradients/amnesty_qa contains custom code which must be executed to correctly load the dataset. You can inspect the repository content at https://hf.co/datasets/explodinggradients/amnesty_qa.
You can avoid this prompt in future by passing the argument `trust_remote_code=True`.

Do you wish to run the custom code? [y/N] y
Repo card metadata block was not found. Setting CardData to empty.
WARNING:huggingface_hub.repocard:Repo card metadata block was not found. Setting CardData to empty.
english.json: 100%
 70.8k/70.8k [00:00<00:00, 1.31MB/s]
Generating eval split: 
 20/0 [00:00<00:00, 246.48 examples/s]
DatasetDict({
    eval: Dataset({
        features: ['user_input', 'reference', 'response', 'retrieved_contexts'],
        num_rows: 20
    })
})
/usr/local/lib/python3.10/dist-packages/ragas/prompt/base.py:9: LangChainDeprecationWarning: As of langchain-core 0.3.0, LangChain uses pydantic v2 internally. The langchain_core.pydantic_v1 module was a compatibility shim for pydantic v1, and should no longer be used. Please update the code to import from Pydantic directly.

For example, replace imports like: `from langchain_core.pydantic_v1 import BaseModel`
with: `from pydantic import BaseModel`
or the v1 compatibility namespace if you are working in a code base that has not been fully upgraded to pydantic 2 yet.     from pydantic.v1 import BaseModel

  from ragas.llms.prompt import PromptValue
Evaluating:   0%
 0/60 [00:04<?, ?it/s]
---------------------------------------------------------------------------
ValidationError                           Traceback (most recent call last)
<ipython-input-7-6864e4d138df> in <cell line: 14>()
     12 evaluator_llm = LangchainLLMWrapper(ChatOpenAI(model="gpt-4o-mini", temperature=0))
     13 metrics = [LLMContextRecall(), FactualCorrectness(), Faithfulness()]
---> 14 results = evaluate(dataset=eval_dataset, metrics=metrics, llm=evaluator_llm, raise_exceptions=True)
     15 
     16 df = results.to_pandas()

50 frames
/usr/local/lib/python3.10/dist-packages/pydantic/main.py in __init__(self, **data)
    210         # `__tracebackhide__` tells pytest and some other tools to omit this function from tracebacks
    211         __tracebackhide__ = True
--> 212         validated_self = self.__pydantic_validator__.validate_python(data, self_instance=self)
    213         if self is not validated_self:
    214             warnings.warn(

ValidationError: 1 validation error for ChatMessage
role
  Input should be a valid string [type=string_type, input_value=None, input_type=NoneType]
    For further information visit https://errors.pydantic.dev/2.9/v/string_type

Code Examples Here is the sample code I ran:

from dotenv import load_dotenv
load_dotenv()
from datasets import load_dataset
dataset = load_dataset("explodinggradients/amnesty_qa", "english_v3")
print(dataset)

from ragas import EvaluationDataset
eval_dataset = EvaluationDataset.from_hf_dataset(dataset["eval"])

from ragas.metrics import LLMContextRecall, Faithfulness, FactualCorrectness, SemanticSimilarity
from ragas import evaluate
from ragas.llms import LangchainLLMWrapper
from langchain_openai import ChatOpenAI
evaluator_llm = LangchainLLMWrapper(ChatOpenAI(model="gpt-4o-mini", temperature=0))
metrics = [LLMContextRecall(), FactualCorrectness(), Faithfulness()]
results = evaluate(dataset=eval_dataset, metrics=metrics, llm=evaluator_llm, raise_exceptions=True)

df = results.to_pandas()
df.head()

Additional context The version I used is: langchain 0.3.4 langchain-community 0.3.3 langchain-core 0.3.12 langchain-openai 0.2.3 langchain-text-splitters 0.3.0 ragas 0.2.1 pydantic 2.9.2 pydantic_core 2.23.4 pydantic-settings 2.6.0

dosubot[bot] commented 1 month ago

Hey there, @miaoxiaolv! I'm here to help you with bugs, questions, and becoming a contributor. Let's get that issue sorted out together!

The ValidationError you're encountering with the ChatMessage role likely stems from an incorrect or missing type attribute in your message objects. In the Ragas library, the ChatMessage role is represented by specific message classes like HumanMessage, ToolMessage, and AIMessage. Each of these classes requires a type attribute to be set to a specific literal value to avoid validation errors:

  1. HumanMessage:

    • type: Must be "human".
  2. ToolMessage:

    • type: Must be "tool".
  3. AIMessage:

    • type: Must be "ai".

Ensure that the type attribute is correctly set for each message class you are using. Additionally, when using ToolMessage, it must be preceded by an AIMessage with tool_calls defined [1].

Regarding the Hugging Face Hub authentication warning, you need to set up your HF_TOKEN in Google Colab. You can do this by creating a token in your Hugging Face account settings and then adding it to your Colab environment using the following command:

from google.colab import output
output.enable_custom_widget_manager()

# Set your Hugging Face token
import os
os.environ["HF_TOKEN"] = "your_huggingface_token"

This will help you authenticate and access the datasets without warnings.

To continue talking to Dosu, mention @dosu.

miaoxiaolv commented 1 month ago

I have checked the dataset and there are no issues. Could it be a compatibility issue with some module versions?

dosubot[bot] commented 1 month ago

The Ragas library requires Pydantic version 2 or higher (pydantic>=2) for compatibility [1]. There are no specific version requirements mentioned for Langchain, so it should be compatible with any version available at the time. However, given the deprecation warning you encountered, it seems there might be some compatibility issues with the versions of Langchain and Pydantic you are using.

To address potential compatibility issues, ensure that your environment is set up correctly. You can create a virtual environment and install the necessary dependencies using:

python -m venv venv
source venv/bin/activate  # On Windows, use `venv\Scripts\activate`
pip install -e ".[dev]"

This setup helps manage dependencies and reduces the risk of compatibility issues [2]. If the problem persists, consider checking for updates or patches in the Ragas library that might address compatibility with the latest versions of Langchain or Pydantic.

To continue talking to Dosu, mention @dosu.

jjmachan commented 1 month ago

@miaoxiaolv this could be because of a json decoding error - do you use any tracing tools?

miaoxiaolv commented 1 month ago

I'm really sorry, but this error was mainly due to some issues with my OpenAI API key. After changing the API key, the error is gone. Thank you for your help.

jjmachan commented 3 weeks ago

@miaoxiaolv glad to hear it is solved - closing this now 🙂