Open rtzy7 opened 4 months ago
Whoops! I'm assuming this answers my question.
@rtzy7 Interesting, if you use it within GEval we gracefully take care of the Attribute Error which is why we raised it. But when you call it as a standalone you'll get the error :)
@penguine-ip is there an end-to-end example for how to use GEval with Azure?
In case anyone finds it useful, GEval does not currently work as expected with Azure on API versions 2024-02-01
and later (don't know about lower versions).
1) API version 2024-02-01
: does not support logprobs at all so they are not used in the calculation. The scores come out with one decimal point (eg. 0.2
, 0.7
, etc).
2) API versions 2024-03-01-preview
, 2024-04-01-preview
, 2024-05-01-preview
: support a maximum of 5 top_logprobs
. The value 20 is hardcoded here so I'm getting "Invalid value for 'top_logprobs': must be less than or equal to 5."
error. Changing 20 to 5 in the source code seems to work, which is the workaround I'm currently using.
@petrgazarov Thanks! If I am understanding it correctly, the template here grades the response from 0 to 10, which then gets scaled here. I'm guessing this explains the one decimal point values!
Also, for the workaround that you mentioned, should one also change the template to grade the response on a scale of 1-5 instead of a scale of 0-10? This is so that the weighted summed score can be generated as expected, given that the max value for the parameter top_logprobs
is 5.
I'm pretty sure that the scale in the template has nothing to do with logprobs. Passing 5 to top_logprobs
would return the top 5 log probs instead of top 20 (like in the example here).
Getting the following error when initializing my fluency_metric with the example Azure model from the docs into GEval as the model
parameter. Not sure why it is asking for the Open AI API Key. Any ideas here?
Error:
Traceback (most recent call last):
File "/Users/cthomps3/Documents/git/hmh/genai-platform-core-1/genai_core_component_library/evaluator/genai_evaluator/eval_test.py", line 7, in <module>
from summarization_eval_strategy import SummarizationStrategy
File "/Users/cthomps3/Documents/git/hmh/genai-platform-core-1/genai_core_component_library/evaluator/genai_evaluator/summarization_eval_strategy.py", line 10, in <module>
from fluency_metric import fluency_metric
File "/Users/cthomps3/Documents/git/hmh/genai-platform-core-1/genai_core_component_library/evaluator/genai_evaluator/fluency_metric.py", line 10, in <module>
fluency_metric = GEval(
^^^^^^
File "/Users/cthomps3/Library/Caches/pypoetry/virtualenvs/genai-evaluator-FNF4B49N-py3.12/lib/python3.12/site-packages/deepeval/metrics/g_eval/g_eval.py", line 106, in __init__
self.model, self.using_native_model = initialize_model(model)
^^^^^^^^^^^^^^^^^^^^^^^
File "/Users/cthomps3/Library/Caches/pypoetry/virtualenvs/genai-evaluator-FNF4B49N-py3.12/lib/python3.12/site-packages/deepeval/metrics/utils.py", line 86, in initialize_model
return GPTModel(model=model), True
^^^^^^^^^^^^^^^^^^^^^
File "/Users/cthomps3/Library/Caches/pypoetry/virtualenvs/genai-evaluator-FNF4B49N-py3.12/lib/python3.12/site-packages/deepeval/models/gpt_model.py", line 61, in __init__
super().__init__(model_name)
File "/Users/cthomps3/Library/Caches/pypoetry/virtualenvs/genai-evaluator-FNF4B49N-py3.12/lib/python3.12/site-packages/deepeval/models/base_model.py", line 35, in __init__
self.model = self.load_model(*args, **kwargs)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/Users/cthomps3/Library/Caches/pypoetry/virtualenvs/genai-evaluator-FNF4B49N-py3.12/lib/python3.12/site-packages/deepeval/models/gpt_model.py", line 96, in load_model
return ChatOpenAI(
^^^^^^^^^^^
File "/Users/cthomps3/Library/Caches/pypoetry/virtualenvs/genai-evaluator-FNF4B49N-py3.12/lib/python3.12/site-packages/pydantic/v1/main.py", line 341, in __init__
raise validation_error
pydantic.v1.error_wrappers.ValidationError: 2 validation errors for ChatOpenAI
model
none is not an allowed value (type=type_error.none.not_allowed)
__root__
Did not find openai_api_key, please add an environment variable `OPENAI_API_KEY` which contains it, or pass `openai_api_key` as a named parameter. (type=value_error)
Custom Model:
class AzureOpenAI(DeepEvalBaseLLM):
"""Custom Azure OpenAI Model for evaluation."""
def __init__(self, model):
if model is None:
raise ValueError("Model cannot be None")
self.model = model
def load_model(self):
"""Load the Azure OpenAI model."""
try:
return self.model
except Exception as e:
print(f"An error occurred while loading the model: {e}")
return None
def generate(self, prompt: str) -> str:
"""
Generate output synchronously using the Azure OpenAI model.
Parameters
----------
prompt : str
The prompt to generate output from.
Returns
-------
str
The generated output.
"""
if not isinstance(prompt, str):
raise ValueError("Prompt must be a string")
try:
chat_model = self.load_model()
return chat_model.invoke(prompt).content
except Exception as e:
print(f"An error occurred while generating output: {e}")
return None
async def a_generate(self, prompt: str) -> str:
"""
Generate output asynchronously using the Azure OpenAI model.
Parameters
----------
prompt : str
The prompt to generate output from.
Returns
-------
str
The generated output.
"""
if not isinstance(prompt, str):
raise ValueError("Prompt must be a string")
try:
chat_model = self.load_model()
res = await chat_model.ainvoke(prompt)
return res.content
except Exception as e:
print(f"An error occurred while generating output asynchronously: {e}")
return None
def get_model_name(self):
"""
Get the name of the Azure OpenAI model deployemnt.
Returns
-------
str
The name of the Azure OpenAI model deployment.
"""
return self.model.deployment_name
custom_model = AzureChatOpenAI(
api_version=os.getenv("AZURE_OPENAI_API_VERSION"),
azure_deployment=os.getenv("AZURE_DEPLOYMENT_NAME"),
azure_endpoint=os.getenv("AZURE_OPENAI_API_BASE"),
openai_api_key=os.getenv("AZURE_OPENAI_API_KEY"),
)
Custom Metric
from eval_models import custom_model
fluency_metric = GEval(
name="Fluency",
criteria="Fluency measures the quality of individual sentences in the answer, and whether they are well-written and grammatically correct. Consider the quality of individual sentences when evaluating fluency.",
model=custom_model,
evaluation_params=[LLMTestCaseParams.INPUT, LLMTestCaseParams.ACTUAL_OUTPUT]
)
Hi! I have noticed that G-Eval uses .generate_raw_response for its calculations. I wanted to understand the inner workings of the metric and thus tried to dive deeper into the code.
I have initialized the env variables and have gotten "🙌 Congratulations! You're now using Azure OpenAI for all evals that require an LLM."