Port to new promptflow-evals SDK

Purpose

This PR ports the code to use the promptflow-evals SDK for the evaluation functionality, as the evaluate functionality is being deprecated in azure-ai-generative. The Q&A generation is still in azure-ai-generative for now.

Some user-facing changes:

Renamed custom metrics to "mygroundedness", "myrelevance", "mycoherence" to make it clear they're not the built-in metrics.
Added built-in metrics- fluency, similarity, f1score.

Does this introduce a breaking change?

[X] Yes - see above. If you previously generated custom metrics, you'll want to rename keys in evalresults.json to the keys above.
[ ] No

Pull Request Type

What kind of change does this Pull Request introduce?

[ ] Bugfix
[X] Feature
[ ] Code style update (formatting, local variables)
[ ] Refactoring (no functional changes, no api changes)
[ ] Documentation content changes
[ ] Other... Please describe:

How to Test

Ran pytest
Ran evaluate
Ran generate

Azure-Samples / ai-rag-chat-evaluator

Port to new promptflow-evals SDK #85

Purpose

Does this introduce a breaking change?

Pull Request Type

How to Test