Open DatTNH23 opened 1 month ago
Hey there, @DatTNH23! I'm here to help you with your bug and answer any questions you may have. Let's tackle this together!
I found a similar open issue: Generate test data for 1 pdf, which is still open and was last updated on September 20, 2024. This issue was referenced by another issue: Runner thread raised exception and testset function is not accessible, last updated on September 24, 2024.
To resolve the syntax error in the generate_with_langchain_docs
method when using RAGAS to generate a German dataset with Ollama LLM and HuggingFace Embedding, ensure that the method is properly indented and correctly formatted. Here is the corrected version of the generate_with_langchain_docs
method:
def generate_with_langchain_docs(
self,
docs: t.Sequence[Document],
test_size: int,
distribution: QADistribution,
) -> TestDataset:
return self.generate(docs, test_size, distribution)
Make sure that the return
statement is correctly indented and placed within the method. This should resolve the syntax error [1][2].
Additionally, ensure that the language
parameter is correctly set and used in the adapt
method. The language
parameter must be a string and is used to adapt various components to ensure they function correctly in the specified language [1][3].
If you continue to face issues, please check the implementation of the adapt
method in the ragas/testset/extractor.py
file:
def adapt(self, language: str, cache_dir: t.Optional[str] = None) -> None:
"""
Adapt the extractor to a different language.
"""
self.extractor_prompt = self.extractor_prompt.adapt(
language, self.llm, cache_dir
)
This method adapts the extractor to the specified language [4].
Lastly, ensure that the extractor_prompt.adapt
method in the ragas/llms/prompt.py
file correctly handles the adaptation process:
example_dict = {}
example_dict.update(
{k: v for k, v in zip(self.input_keys, example[: len(self.input_keys)])}
)
if self.output_type.lower() == "json":
example_dict[self.output_key] = json_loader._safe_load(example[-1], llm)
if example_dict[self.output_key] == {}:
dict_str = example[-1].split("(")[0].strip()
example_dict[self.output_key] = ast.literal_eval(dict_str)
else:
example_dict[self.output_key] = example[-1]
if self.output_type.lower() == "json":
output = example_dict[self.output_key]
if isinstance(output, dict):
assert (
set(output.keys()) == output_keys[i]
), f"Adapted output keys {set(output.keys())=} do not match with the original output keys: {output_keys[i]=}"
elif isinstance(output, list) and all(
isinstance(item, dict) for item in output
):
assert all(
set(item.keys()) in output_keys[i] for item in output
), "Adapted output keys do not match with the original output keys"
This ensures that the example_dict
is correctly populated and validated [5].
To continue talking to Dosu, mention @dosu.
@dosu It doesn't help, those 2 Issues are not relevant to my issue
To resolve the syntax error in the generate_with_langchain_docs
method when using RAGAS to generate a German dataset with Ollama LLM and HuggingFace Embedding, ensure that the method is properly indented and correctly formatted. Here is the corrected version of the generate_with_langchain_docs
method:
def generate_with_langchain_docs(
self,
docs: t.Sequence[Document],
test_size: int,
distribution: QADistribution,
) -> TestDataset:
return self.generate(docs, test_size, distribution)
Make sure that the return
statement is correctly indented and placed within the method. This should resolve the syntax error [1][2].
Additionally, ensure that the language
parameter is correctly set and used in the adapt
method. The language
parameter must be a string and is used to adapt various components to ensure they function correctly in the specified language [1].
If you continue to face issues, please check the implementation of the adapt
method in the ragas/testset/extractor.py
file:
def adapt(self, language: str, cache_dir: t.Optional[str] = None) -> None:
"""
Adapt the extractor to a different language.
"""
self.extractor_prompt = self.extractor_prompt.adapt(
language, self.llm, cache_dir
)
This method adapts the extractor to the specified language [3].
Lastly, ensure that the extractor_prompt.adapt
method in the ragas/llms/prompt.py
file correctly handles the adaptation process:
example_dict = {}
example_dict.update(
{k: v for k, v in zip(self.input_keys, example[: len(self.input_keys)])}
)
if self.output_type.lower() == "json":
example_dict[self.output_key] = json_loader._safe_load(example[-1], llm)
if example_dict[self.output_key] == {}:
dict_str = example[-1].split("(")[0].strip()
example_dict[self.output_key] = ast.literal_eval(dict_str)
else:
example_dict[self.output_key] = example[-1]
if self.output_type.lower() == "json":
output = example_dict[self.output_key]
if isinstance(output, dict):
assert (
set(output.keys()) == output_keys[i]
), f"Adapted output keys {set(output.keys())=} do not match with the original output keys: {output_keys[i]=}"
elif isinstance(output, list) and all(
isinstance(item, dict) for item in output
):
assert all(
set(item.keys()) in output_keys[i] for item in output
), "Adapted output keys do not match with the original output keys"
This ensures that the example_dict
is correctly populated and validated [4].
To continue talking to Dosu, mention @dosu.
[ ] I have checked the documentation and related resources and couldn't resolve my bug.
Describe the bug A clear and concise description of what the bug is.
Ragas version: 0.1.20 Python version: Python 3.12.6
Code to Reproduce from langchain_core.language_models import BaseLanguageModel from langchain_core.outputs.llm_result import LLMResult from langchain_core.outputs.generation import Generation from langchain_core.runnables.config import run_in_executor from langchain_core.embeddings import Embeddings from langchain_core.messages import BaseMessage from langchain_community.document_loaders import DirectoryLoader from langchain_community.embeddings import HuggingFaceEmbeddings from ragas.embeddings import LangchainEmbeddingsWrapper from ragas.testset.generator import TestsetGenerator from ragas.llms.prompt import PromptValue from ragas.testset.evolutions import simple, reasoning,conditional from typing import List, Optional, Any, Sequence import requests import json import traceback import logging
logging.basicConfig(level=logging.INFO) logger = logging.getLogger(name)
def handle_response(response_text):
Split the response into lines
class OllamaLLM(BaseLanguageModel): model: str = "llama3"
class EmbeddingOllama(Embeddings):
def main(): model_name = "llama3.1"
embedding_model = "bge-m3"
if name == "main": main()
Error trace Error in generate_with_langchain_docs: invalid syntax (, line 1)
Traceback (most recent call last):
File "/Users/nguyenthanhdat/Documents/Projekte/test/generate_ragas.py", line 195, in main
generator.adapt(language, evolutions=[simple, reasoning,conditional])
File "/Users/nguyenthanhdat/Documents/Projekte/test/myenv/lib/python3.12/site-packages/ragas/testset/generator.py", line 340, in adapt
self.docstore.extractor.adapt(language, cache_dir=cache_dir)
File "/Users/nguyenthanhdat/Documents/Projekte/test/myenv/lib/python3.12/site-packages/ragas/testset/extractor.py", line 61, in adapt
self.extractor_prompt = self.extractor_prompt.adapt(
^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/Users/nguyenthanhdat/Documents/Projekte/test/myenv/lib/python3.12/site-packages/ragas/llms/prompt.py", line 246, in adapt
example_dict[self.output_key] = ast.literal_eval(dict_str)
^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/opt/homebrew/Cellar/python@3.12/3.12.6/Frameworks/Python.framework/Versions/3.12/lib/python3.12/ast.py", line 66, in literal_eval
node_or_string = parse(node_or_string.lstrip(" \t"), mode='eval')
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/opt/homebrew/Cellar/python@3.12/3.12.6/Frameworks/Python.framework/Versions/3.12/lib/python3.12/ast.py", line 52, in parse
return compile(source, filename, mode, flags,
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "", line 1
To translate the values in the given JSON to the target language
^^^^^^^^^
SyntaxError: invalid syntax
Expected behavior I except RAGAS can generate german dataset for me.
Additional context Here I use LLM from Ollama (LLama3.1) and HuggingFace Embedding