Open runwean opened 1 month ago
Hey @runwean! 👋 I'm here to help you out with any bugs, questions, or contributions while you wait for a human maintainer. Let's tackle this issue together!
To resolve the pydantic.v1.error_wrappers.ValidationError
related to the JSON format in the Prompt
class when changing the language to Chinese in the adapt
function, you need to ensure that the JSON structure is correctly handled and validated. The error likely arises from the JSON structure not matching the expected format after translation.
Here is a step-by-step approach to address this issue:
Ensure JSON Structure Consistency: Make sure that the translated JSON maintains the same structure as the original JSON. This includes having the same keys and nested structures.
Validate JSON Structure: After translation, validate the JSON structure to ensure it matches the expected format.
Handle JSON Loading and Parsing: Use safe methods to load and parse JSON to avoid issues with incorrect formats.
Here is the relevant part of the adapt
function with added validation and error handling:
def adapt(
self, language: str, llm: BaseRagasLLM, cache_dir: t.Optional[str] = None
) -> Prompt:
def get_all_keys(nested_json):
keys = set()
for key, value in nested_json.items():
keys.add(key)
if isinstance(value, dict):
keys = keys.union(get_all_keys(value))
return keys
if self.language == language:
return self
# TODO: Add callbacks
cache_dir = cache_dir if cache_dir else get_cache_dir()
if os.path.exists(os.path.join(cache_dir, language, f"{self.name}.json")):
return self._load(language, self.name, cache_dir)
logger.info("Adapting %s to %s", self.name, language)
prompts = []
output_keys = []
for example in self.examples:
prompts.extend(
[
str_translation.format(
translate_to=language, input=example.get(key)
)
for key in self.input_keys
]
)
prompts.append(
json_translatation.format(
translate_to=language, input=example.get(self.output_key)
)
if self.output_type.lower() == "json"
else str_translation.format(
translate_to=language, input=example.get(self.output_key)
)
)
if self.output_type.lower() == "json":
output = example.get(self.output_key)
if isinstance(output, str):
output = json.loads(output)
if isinstance(output, dict):
output_keys.append(get_all_keys(output))
elif isinstance(output, list) and all(
isinstance(item, dict) for item in output
):
output_keys.append([get_all_keys(item) for item in output])
# NOTE: this is a slow loop, consider Executor to fasten this
results = []
for p in prompts:
results.append(llm.generate_text(p).generations[0][0].text)
per_example_items = len(self.input_keys) + 1
grouped_results = [
results[i : i + per_example_items]
for i in range(0, len(results), per_example_items)
]
assert len(grouped_results) == len(
self.examples
), "examples and adapted examples must be of equal length"
for i, example in enumerate(grouped_results):
example_dict = {}
example_dict.update(
{k: v for k, v in zip(self.input_keys, example[: len(self.input_keys)])}
)
example_dict[self.output_key] = (
json_loader._safe_load(example[-1], llm)
if self.output_type.lower() == "json"
else example[-1]
)
if self.output_type.lower() == "json":
output = example_dict[self.output_key]
if isinstance(output, dict):
assert (
set(output.keys()) == output_keys[i]
), f"Adapted output keys {set(output.keys())=} do not match with the original output keys: {output_keys[i]=}"
elif isinstance(output, list) and all(
isinstance(item, dict) for item in output
):
assert all(
set(item.keys()) in output_keys[i] for item in output
), "Adapted output keys do not match with the original output keys"
self.examples[i] = example_dict
self.language = language
# TODO:Validate the prompt after adaptation
return self
This code ensures that the JSON structure is validated after translation, which should help in resolving the pydantic.v1.error_wrappers.ValidationError
[1].
To continue talking to Dosu, mention @dosu.
same issue
@Mervyn-1 @runwean sorry about this but do keep an eye on #1101 and #890 . will update that issue when we have a fix
I've encountered the same issue. It would be greatly appreciated if the author could look into it. Thank you so much!
It might not be an issue with the LLM API. I tried GPT-4o, GPT-4o-mini, and DeepSeek, and their translated cache JSON is almost the same, and I can't find any errors. Also, the first time you execute the adaptation and test generation, it works fine, so I guess something is wrong when it's trying to read the promt JSON saved as cache.
I want to customize the adapt function, change the language to Chinese
this is code:
and this is the error