Azure-Samples / ai-rag-chat-evaluator

Tools for evaluation of RAG Chat Apps using Azure AI Evaluate SDK and OpenAI
MIT License
161 stars 58 forks source link

Getting nan values in results for metrics #80

Open PavanMahajan777 opened 2 months ago

PavanMahajan777 commented 2 months ago

Please provide us with the following information:

While running the py snippet i am getting null values for some metrices. Using below mentioned input.

question="What is the capital of France?", context="France is in Europe and it's capital is Paris.", answer="Paris is the capital of France." truth = "Paris"

results = evaluate( target=wrap_target, data=testdata, task_type="qa", metrics_list=["gpt_groundedness","gpt_relevance","gpt_coherence","gpt_fluency","gpt_similarity", "hate_unfairness", "sexual", "violence", "self_harm"], model_config= model_config data_mapping={ "question": "question", "context": "context", "answer": "answer", }, tracking=False, output_path="./")

This issue is for a: (mark with an x)

- [ print(results.metrics_summary )
- {'gpt_coherence': nan,
 'gpt_similarity': nan,
 'gpt_fluency': nan,
 'gpt_relevance': 5.0,
 'gpt_groundedness': 5.0,
 'self_harm_defect_rate': nan,
 'violence_defect_rate': nan,
 'hate_unfairness_defect_rate': nan,
 'sexual_defect_rate': nan} ] bug report -> please search issues before submitting
- [ ] feature request
- [ ] documentation issue or request
- [ ] regression (a behavior that used to work and stopped in a new release)

Minimal steps to reproduce

Any log messages given by the failure

Fail writing properties '{'_azureml.evaluation_run': 'azure-ai-generative-parent'}' to run history: 'FileStore' object has no attribute 'get_host_creds'

Expected/desired behavior

I am getting the answer for gpt_groundedness and gpt_relevance but for other metrics getting null values.

OS and Version?

Windows 11

Versions

azure-ai-generative[evaluate]==1.0.0b8 promptflow==1.6.0

Mention any other details that might be useful


Thanks! We'll be in touch soon.

HatefulRock commented 2 months ago

I also got nan values when using the gpt metrics. Try changing those metrics and use the local custom metrics. Go into your example_config.json file and replace requested_metrics with : "requested_metrics": ["groundedness", "relevance", "coherence", "answer_length", "latency"] This solution worked for me