Open francescofan opened 3 days ago
To adapt the language of questions and ground-truth results to match the language of input documents in multiple languages using the Ragas framework, you can use the adapt
function. However, the Ragas framework does not have a built-in feature to automatically detect the language of input documents. You will need to manually specify the language for each document when using the adapt
function.
Here's a basic example of how you can use the adapt
function:
from ragas.metrics import (
faithfulness,
answer_correctness,
)
from langchain.chat_models import ChatOpenAI
from ragas import adapt
# Specify the language model to use for adaptation
openai_model = ChatOpenAI(model_name="gpt-4")
# Adapt the metrics to the target language
# Replace "target_language" with the actual language code for each document
adapt(metrics=[faithfulness, answer_correctness], language="target_language", llm=openai_model)
In your case, since the documents are in different languages, you would need to run the adapt
function separately for each document, specifying the correct language code (e.g., "en" for English, "ja" for Japanese, "zh" for Chinese, "fr" for French) for each one.
If you want to automate the language detection process, you might consider integrating an external language detection library to determine the language of each document before calling the adapt
function. Unfortunately, Ragas does not provide utilities for integrating such libraries directly, so you would need to handle this part separately.
To continue talking to Dosu, mention @dosu.
@dosu can you list all the language parameter values could be set?
The language parameter values you can set typically correspond to standard language codes. Here are some common ones you might use:
These codes follow the ISO 639-1 standard. If you need a language not listed here, you can usually find its code in the ISO 639-1 list.
To continue talking to Dosu, mention @dosu.
@dosu
when I ran the code after modification of language, error occured as following:
error
Traceback (most recent call last):
File "d:\Notes\2.Python\next_console\next_console_qa_evaluate\ragas\ragas_test.py", line 112, in
code
generator = TestsetGenerator.from_langchain(
generator_llm,
critic_llm,
embeddings
)
language = "zh"
generator.adapt(language, evolutions=[simple, reasoning, conditional, multi_context])
generator.save(evolutions=[simple, reasoning, multi_context,conditional])
To resolve the "ValidationError: output in example 1 is not in valid JSON format" error, ensure that the output
field in your examples is a valid JSON string. If the output_type
is set to "json", the output
must be a properly formatted JSON string. Here's how you can fix it:
Make sure the output
field in your examples is a valid JSON string. Use json.dumps()
to convert a dictionary or list to a JSON string if necessary.
If the output
is already a JSON object (like a dictionary or list), ensure it is correctly formatted without syntax errors.
Here's an example of how to correctly format the output
:
import json
examples = [
{
"translate_to": "chinese",
"input": {
"statements": [
"Albert Einstein was born in Germany.",
"Albert Einstein was best known for his theory of relativity.",
]
},
"output": json.dumps({
"statements": [
"阿尔伯特·爱因斯坦出生在德国。",
"阿尔伯特·爱因斯坦因其相对论而闻名。",
]
}), # Ensure this is a valid JSON string
},
# Add more examples as needed
]
Ensure that the output
field is correctly formatted as a JSON string if output_type
is "json" [1][2].
To continue talking to Dosu, mention @dosu.
@dosu based on the message above, where is the output field?
The output
field is typically part of the examples you provide when using the adapt
function. It represents the expected output for a given input, formatted as a JSON string if the output_type
is set to "json". In your code, it seems like the output
field might be part of the data being processed by the TestsetGenerator
.
To locate it, check the data or configuration files you are using with the TestsetGenerator
. Look for any section where you define examples or expected outputs, and ensure that these outputs are correctly formatted as JSON strings. If you're using a configuration file or script to define these examples, verify that each output
is properly structured and valid JSON.
To continue talking to Dosu, mention @dosu.
@dosu I havn't define any example manually. Where could be the example located?
If you haven't manually defined any examples, the TestsetGenerator
in Ragas might be using default examples. The generator determines default examples using the default_query_distribution
function when no query distribution is manually defined. This is set in the generate
method where query_distribution
defaults to default_query_distribution(self.llm)
if it is None
[1].
To locate these default examples, you would need to check the implementation of the default_query_distribution
function or any related configuration within the Ragas framework that might be providing these defaults.
To continue talking to Dosu, mention @dosu.
@dosu could you tell the excat place or filename where i should check?
You should check the default_query_distribution
function, which is implemented in the src/ragas/testset/synthesizers/__init__.py
file of the Ragas codebase [1]. This function is likely responsible for providing the default examples or query distribution used by the TestsetGenerator
.
To continue talking to Dosu, mention @dosu.
@dosu in 'Lib\site-packages\ragas' anaconda packages directory,no file or directory is named as synthesizers.
In the installed package directory, the structure might differ from the source repository. The functionality related to default examples or query distribution could be integrated into other parts of the package.
To troubleshoot further, consider checking the following:
generator.py
or extractor.py
: These files might contain logic related to generating or adapting examples. Look for any functions or methods that deal with default settings or examples.
Configuration Files: Check if there are any configuration files or settings within the package that specify default behaviors or examples.
Documentation or Comments: Look for any documentation or comments within the code that might indicate where default examples are set or how they are generated.
If these steps don't help, you might need to explore the package's source code more broadly to understand how it handles default examples or query distributions. Alternatively, consider reaching out to the maintainers or community for more specific guidance on the package's structure.
To continue talking to Dosu, mention @dosu.
[ ] I checked the documentation and related resources and couldn't find an answer to my question.
Your Question As the document of ragas says in "Automatic language adaptation"(https://docs.ragas.io/en/v0.1.1/howtos/applications/use_prompt_adaptation.html#language-adaptation-for-metrics),we can adapt the language to the document using adapt Function. But I suppose that the document shows that I have to set the language in advance, because I have to know the language of the context I submit. But the actual situation is that I have some papers or documents in different language such as English, Japanese, Chinese, French. I would like to get a set of questions and ground-truth results which is written in the exactly the same language with the context I submit. Is that possible? If so, ,how to achieve it?
Code Examples This community speaks code. Share your code snippets to help us understand your question better.
Additional context Anything else you want to share with us?