Azure-Samples / ai-rag-chat-evaluator

Tools for evaluation of RAG Chat Apps using Azure AI Evaluate SDK and OpenAI
MIT License
226 stars 78 forks source link

Bump azure-ai-evaluation from 1.0.0b3 to 1.0.0b5 #110

Open dependabot[bot] opened 1 week ago

dependabot[bot] commented 1 week ago

Bumps azure-ai-evaluation from 1.0.0b3 to 1.0.0b5.

Release notes

Sourced from azure-ai-evaluation's releases.

azure-ai-evaluation_1.0.0b5

1.0.0b5 (2024-10-28)

Features Added

  • Added GroundednessProEvaluator, which is a service-based evaluator for determining response groundedness.
  • Groundedness detection in Non Adversarial Simulator via query/context pairs
import importlib.resources as pkg_resources
package = "azure.ai.evaluation.simulator._data_sources"
resource_name = "grounding.json"
custom_simulator = Simulator(model_config=model_config)
conversation_turns = []
with pkg_resources.path(package, resource_name) as grounding_file:
    with open(grounding_file, "r") as file:
        data = json.load(file)
for item in data:
    conversation_turns.append([item])
outputs = asyncio.run(custom_simulator(
    target=callback,
    conversation_turns=conversation_turns,
    max_conversation_turns=1,
))
  • Adding evaluator for multimodal use cases

Breaking Changes

  • Renamed environment variable PF_EVALS_BATCH_USE_ASYNC to AI_EVALS_BATCH_USE_ASYNC.
  • RetrievalEvaluator now requires a context input in addition to query in single-turn evaluation.
  • RelevanceEvaluator no longer takes context as an input. It now only takes query and response in single-turn evaluation.
  • FluencyEvaluator no longer takes query as an input. It now only takes response in single-turn evaluation.
  • AdversarialScenario enum does not include ADVERSARIAL_INDIRECT_JAILBREAK, invoking IndirectJailbreak or XPIA should be done with IndirectAttackSimulator
  • Outputs of Simulator and AdversarialSimulator previously had to_eval_qa_json_lines and now has to_eval_qr_json_lines. Where to_eval_qa_json_lines had:
{"question": <user_message>, "answer": <assistant_message>}

to_eval_qr_json_lines now has:

{"query": <user_message>, "response": assistant_message}

Bugs Fixed

  • Non adversarial simulator works with gpt-4o models using the json_schema response format
  • Fixed an issue where the evaluate API would fail with "[WinError 32] The process cannot access the file because it is being used by another process" when venv folder and target function file are in the same directory.
  • Fix evaluate API failure when trace.destination is set to none
  • Non adversarial simulator now accepts context from the callback

Other Changes

  • Improved error messages for the evaluate API by enhancing the validation of input parameters. This update provides more detailed and actionable error descriptions.
  • GroundednessEvaluator now supports query as an optional input in single-turn evaluation. If query is provided, a different prompt template will be used for the evaluation.
  • To align with our support of a diverse set of models, the following evaluators will now have a new key in their result output without the gpt_ prefix. To maintain backwards compatibility, the old key with the gpt_ prefix will still be present in the output; however, it is recommended to use the new key moving forward as the old key will be deprecated in the future.

... (truncated)

Commits


Dependabot compatibility score

Dependabot will resolve any conflicts with this PR as long as you don't alter it yourself. You can also trigger a rebase manually by commenting @dependabot rebase.


Dependabot commands and options
You can trigger Dependabot actions by commenting on this PR: - `@dependabot rebase` will rebase this PR - `@dependabot recreate` will recreate this PR, overwriting any edits that have been made to it - `@dependabot merge` will merge this PR after your CI passes on it - `@dependabot squash and merge` will squash and merge this PR after your CI passes on it - `@dependabot cancel merge` will cancel a previously requested merge and block automerging - `@dependabot reopen` will reopen this PR if it is closed - `@dependabot close` will close this PR and stop Dependabot recreating it. You can achieve the same result by closing it manually - `@dependabot show ignore conditions` will show all of the ignore conditions of the specified dependency - `@dependabot ignore this major version` will close this PR and stop Dependabot creating any more for this major version (unless you reopen the PR or upgrade to it yourself) - `@dependabot ignore this minor version` will close this PR and stop Dependabot creating any more for this minor version (unless you reopen the PR or upgrade to it yourself) - `@dependabot ignore this dependency` will close this PR and stop Dependabot creating any more for this dependency (unless you reopen the PR or upgrade to it yourself)