Bump azure-ai-evaluation from 1.0.0b3 to 1.0.0b5

Bumps azure-ai-evaluation from 1.0.0b3 to 1.0.0b5.

Release notes

Sourced from azure-ai-evaluation's releases.

azure-ai-evaluation_1.0.0b5

1.0.0b5 (2024-10-28)

Features Added

Added GroundednessProEvaluator, which is a service-based evaluator for determining response groundedness.

Groundedness detection in Non Adversarial Simulator via query/context pairs
import importlib.resources as pkg_resources
package = "azure.ai.evaluation.simulator._data_sources"
resource_name = "grounding.json"
custom_simulator = Simulator(model_config=model_config)
conversation_turns = []
with pkg_resources.path(package, resource_name) as grounding_file:
    with open(grounding_file, "r") as file:
        data = json.load(file)
for item in data:
    conversation_turns.append([item])
outputs = asyncio.run(custom_simulator(
    target=callback,
    conversation_turns=conversation_turns,
    max_conversation_turns=1,
))
Adding evaluator for multimodal use cases

Breaking Changes

Renamed environment variable PF_EVALS_BATCH_USE_ASYNC to AI_EVALS_BATCH_USE_ASYNC.

RetrievalEvaluator now requires a context input in addition to query in single-turn evaluation.

RelevanceEvaluator no longer takes context as an input. It now only takes query and response in single-turn evaluation.

FluencyEvaluator no longer takes query as an input. It now only takes response in single-turn evaluation.

AdversarialScenario enum does not include ADVERSARIAL_INDIRECT_JAILBREAK, invoking IndirectJailbreak or XPIA should be done with IndirectAttackSimulator

Outputs of Simulator and AdversarialSimulator previously had to_eval_qa_json_lines and now has to_eval_qr_json_lines. Where to_eval_qa_json_lines had:
{"question": <user_message>, "answer": <assistant_message>}
to_eval_qr_json_lines now has:
{"query": <user_message>, "response": assistant_message}
Bugs Fixed

Non adversarial simulator works with gpt-4o models using the json_schema response format

Fixed an issue where the evaluate API would fail with "[WinError 32] The process cannot access the file because it is being used by another process" when venv folder and target function file are in the same directory.

Fix evaluate API failure when trace.destination is set to none

Non adversarial simulator now accepts context from the callback

Other Changes

Improved error messages for the evaluate API by enhancing the validation of input parameters. This update provides more detailed and actionable error descriptions.

GroundednessEvaluator now supports query as an optional input in single-turn evaluation. If query is provided, a different prompt template will be used for the evaluation.

To align with our support of a diverse set of models, the following evaluators will now have a new key in their result output without the gpt_ prefix. To maintain backwards compatibility, the old key with the gpt_ prefix will still be present in the output; however, it is recommended to use the new key moving forward as the old key will be deprecated in the future.

... (truncated)

Commits

5a41ba0 azure-ai-evaluation release 1.0.0b5 2024-10-28 (#38138)
3046e7a Multi modal eval fix (#38134)
9592bf7 Clean-up cosmos test pipeline (#38126)
6aae497 [Identity][Monitor] Update live test setup (#37943)
e69815a Generating SDK with model renames (#38108)
5b78782 Multi-Modal-Content-Safety-Evaluators (#38002)
558336a Setting live tests to false temporarily, until DefaultAzureCredential is fixe...
052d6fc Serverless endpoint list failure (#38028)
df44967 Add Distillation SDK and CLI (#37950)
81ad21d [AutoRelease] t2-appcontainers-2024-10-10-38427(can only be merged by SDK own...
Additional commits viewable in compare view

Dependabot will resolve any conflicts with this PR as long as you don't alter it yourself. You can also trigger a rebase manually by commenting @dependabot rebase.

Dependabot commands and options

You can trigger Dependabot actions by commenting on this PR: - `@dependabot rebase` will rebase this PR - `@dependabot recreate` will recreate this PR, overwriting any edits that have been made to it - `@dependabot merge` will merge this PR after your CI passes on it - `@dependabot squash and merge` will squash and merge this PR after your CI passes on it - `@dependabot cancel merge` will cancel a previously requested merge and block automerging - `@dependabot reopen` will reopen this PR if it is closed - `@dependabot close` will close this PR and stop Dependabot recreating it. You can achieve the same result by closing it manually - `@dependabot show ignore conditions` will show all of the ignore conditions of the specified dependency - `@dependabot ignore this major version` will close this PR and stop Dependabot creating any more for this major version (unless you reopen the PR or upgrade to it yourself) - `@dependabot ignore this minor version` will close this PR and stop Dependabot creating any more for this minor version (unless you reopen the PR or upgrade to it yourself) - `@dependabot ignore this dependency` will close this PR and stop Dependabot creating any more for this dependency (unless you reopen the PR or upgrade to it yourself)

Azure-Samples / ai-rag-chat-evaluator

Bump azure-ai-evaluation from 1.0.0b3 to 1.0.0b5 #110

azure-ai-evaluation_1.0.0b5

1.0.0b5 (2024-10-28)

Features Added

Breaking Changes

Bugs Fixed

Other Changes