Added GroundednessProEvaluator, which is a service-based evaluator for determining response groundedness.
Groundedness detection in Non Adversarial Simulator via query/context pairs
import importlib.resources as pkg_resources
package = "azure.ai.evaluation.simulator._data_sources"
resource_name = "grounding.json"
custom_simulator = Simulator(model_config=model_config)
conversation_turns = []
with pkg_resources.path(package, resource_name) as grounding_file:
with open(grounding_file, "r") as file:
data = json.load(file)
for item in data:
conversation_turns.append([item])
outputs = asyncio.run(custom_simulator(
target=callback,
conversation_turns=conversation_turns,
max_conversation_turns=1,
))
Adding evaluator for multimodal use cases
Breaking Changes
Renamed environment variable PF_EVALS_BATCH_USE_ASYNC to AI_EVALS_BATCH_USE_ASYNC.
RetrievalEvaluator now requires a context input in addition to query in single-turn evaluation.
RelevanceEvaluator no longer takes context as an input. It now only takes query and response in single-turn evaluation.
FluencyEvaluator no longer takes query as an input. It now only takes response in single-turn evaluation.
AdversarialScenario enum does not include ADVERSARIAL_INDIRECT_JAILBREAK, invoking IndirectJailbreak or XPIA should be done with IndirectAttackSimulator
Outputs of Simulator and AdversarialSimulator previously had to_eval_qa_json_lines and now has to_eval_qr_json_lines. Where to_eval_qa_json_lines had:
Non adversarial simulator works with gpt-4o models using the json_schema response format
Fixed an issue where the evaluate API would fail with "[WinError 32] The process cannot access the file because it is being used by another process" when venv folder and target function file are in the same directory.
Fix evaluate API failure when trace.destination is set to none
Non adversarial simulator now accepts context from the callback
Other Changes
Improved error messages for the evaluate API by enhancing the validation of input parameters. This update provides more detailed and actionable error descriptions.
GroundednessEvaluator now supports query as an optional input in single-turn evaluation. If query is provided, a different prompt template will be used for the evaluation.
To align with our support of a diverse set of models, the following evaluators will now have a new key in their result output without the gpt_ prefix. To maintain backwards compatibility, the old key with the gpt_ prefix will still be present in the output; however, it is recommended to use the new key moving forward as the old key will be deprecated in the future.
Dependabot will resolve any conflicts with this PR as long as you don't alter it yourself. You can also trigger a rebase manually by commenting @dependabot rebase.
Dependabot commands and options
You can trigger Dependabot actions by commenting on this PR:
- `@dependabot rebase` will rebase this PR
- `@dependabot recreate` will recreate this PR, overwriting any edits that have been made to it
- `@dependabot merge` will merge this PR after your CI passes on it
- `@dependabot squash and merge` will squash and merge this PR after your CI passes on it
- `@dependabot cancel merge` will cancel a previously requested merge and block automerging
- `@dependabot reopen` will reopen this PR if it is closed
- `@dependabot close` will close this PR and stop Dependabot recreating it. You can achieve the same result by closing it manually
- `@dependabot show ignore conditions` will show all of the ignore conditions of the specified dependency
- `@dependabot ignore this major version` will close this PR and stop Dependabot creating any more for this major version (unless you reopen the PR or upgrade to it yourself)
- `@dependabot ignore this minor version` will close this PR and stop Dependabot creating any more for this minor version (unless you reopen the PR or upgrade to it yourself)
- `@dependabot ignore this dependency` will close this PR and stop Dependabot creating any more for this dependency (unless you reopen the PR or upgrade to it yourself)
Bumps azure-ai-evaluation from 1.0.0b3 to 1.0.0b5.
Release notes
Sourced from azure-ai-evaluation's releases.
... (truncated)
Commits
5a41ba0
azure-ai-evaluation release 1.0.0b5 2024-10-28 (#38138)3046e7a
Multi modal eval fix (#38134)9592bf7
Clean-up cosmos test pipeline (#38126)6aae497
[Identity][Monitor] Update live test setup (#37943)e69815a
Generating SDK with model renames (#38108)5b78782
Multi-Modal-Content-Safety-Evaluators (#38002)558336a
Setting live tests to false temporarily, until DefaultAzureCredential is fixe...052d6fc
Serverless endpoint list failure (#38028)df44967
Add Distillation SDK and CLI (#37950)81ad21d
[AutoRelease] t2-appcontainers-2024-10-10-38427(can only be merged by SDK own...Dependabot will resolve any conflicts with this PR as long as you don't alter it yourself. You can also trigger a rebase manually by commenting
@dependabot rebase
.Dependabot commands and options
You can trigger Dependabot actions by commenting on this PR: - `@dependabot rebase` will rebase this PR - `@dependabot recreate` will recreate this PR, overwriting any edits that have been made to it - `@dependabot merge` will merge this PR after your CI passes on it - `@dependabot squash and merge` will squash and merge this PR after your CI passes on it - `@dependabot cancel merge` will cancel a previously requested merge and block automerging - `@dependabot reopen` will reopen this PR if it is closed - `@dependabot close` will close this PR and stop Dependabot recreating it. You can achieve the same result by closing it manually - `@dependabot show