Closed Z-oo883 closed 1 month ago
To resolve the "1 validation error for Prompt" when generating data in Chinese, ensure that the output
in your examples is in a valid JSON format. The error indicates that the output
field in one of your examples is not properly formatted as JSON.
Here's what you can do:
Check the Examples: Ensure that each example in your Prompt
class has a valid JSON structure for the output
. If the output
is a string, it should be properly quoted.
Validate JSON: Use a JSON validator to check the format of your examples.
Here's an example of how your Prompt
object should be structured:
from ragas.llms.prompt import Prompt
prompt = Prompt(
name="chinese_translation",
instruction="Translate the following text to Chinese.",
output_format_instruction="",
examples=[
{
"translate_to": "chinese",
"input": "What is the capital of China?",
"output": "中国的首都是哪里?" # Ensure this is a valid JSON string
}
],
input_keys=["translate_to", "input"],
output_key="output",
output_type="str",
language="chinese"
)
Make sure all fields are correctly filled and the output
is a valid JSON string [1].
To continue talking to Dosu, mention @dosu.
@Z-oo883 could you check out: https://docs.ragas.io/en/stable/howtos/customizations/metrics/_metrics_language_adaptation
its in ragas v0.2 but it not really complicated to migrate: https://docs.ragas.io/en/stable/howtos/migrations/migrate_from_v01_to_v02/
It seems the issue was answered, closing this now.
ragas 0.1.21, python3.9 code: import nest_asyncio nest_asyncio.apply() from ragas.testset import TestsetGenerator from ragas.testset.evolutions import simple, reasoning, multi_context from langchain_openai import ChatOpenAI, OpenAIEmbeddings from langchain.embeddings import HuggingFaceEmbeddings from langchain.document_loaders import PyPDFLoader
loader = PyPDFLoader("xx.pdf") documents = loader.load_and_split() print(documents) for document in documents: document.metadata['filename'] = document.metadata['source']
generator with openai models
generator_llm = ChatOpenAI( model="Qwen2", temperature=0.3, openai_api_key="xxx", openai_api_base='xxx', stop=['<|im_end|>'] ) critic_llm = ChatOpenAI( model="Qwen2", temperature=0.3, openai_api_key="xxx", openai_api_base='xxx', stop=['<|im_end|>'] ) embedding_model_name = "\embedding\bge-large-zh-v1.5" embedding_model_kwargs = {'device': 'cpu'} embedding_encode_kwargs = {'batch_size': 32, 'normalize_embeddings': True}
embed_model = HuggingFaceEmbeddings( model_name=embedding_model_name, model_kwargs=embedding_model_kwargs, encode_kwargs=embedding_encode_kwargs )
generator = TestsetGenerator.from_langchain( generator_llm, critic_llm, embed_model )
language = "chinese" generator.adapt(language, evolutions=[simple, reasoning,multi_context],cache_dir="a_path") generator.save(evolutions=[simple, reasoning, multi_context],cache_dir="a_path")
generate testset
testset = generator.generate_with_langchain_docs(documents, test_size=1, distributions={ simple: 0.5, reasoning: 0.25, multi_context: 0.25 },with_debugging_logs=True) df = testset.to_pandas() print(testset) df.to_csv("test_set.csv",index=False,encoding='utf-8_sig')
error: Traceback (most recent call last): File "D:\RA_LLM\pythonProject\generate_test_data.py", line 55, in
generator.adapt(language, evolutions=[simple, reasoning,multi_context],cache_dir="a_path")
File "D:\anaconda\envs\ragas\lib\site-packages\ragas\testset\generator.py", line 340, in adapt
self.docstore.extractor.adapt(language, cache_dir=cache_dir)
File "D:\anaconda\envs\ragas\lib\site-packages\ragas\testset\extractor.py", line 61, in adapt
self.extractor_prompt = self.extractor_prompt.adapt(
File "D:\anaconda\envs\ragas\lib\site-packages\ragas\llms\prompt.py", line 185, in adapt
self_cp = self._load(language, self.name, cache_dir)
File "D:\anaconda\envs\ragas\lib\site-packages\ragas\llms\prompt.py", line 286, in _load
return cls(**json.load(open(path)))
File "D:\anaconda\envs\ragas\lib\site-packages\pydantic\v1\main.py", line 341, in init
raise validation_error
pydantic.v1.error_wrappers.ValidationError: 1 validation error for Prompt
root
output in example 1 is not in valid json format: Expecting value: line 1 column 1 (char 0) (type=value_error)