Computing gpt based metrics failed with the exception : 'charmap' codec can't encode characters

KorbinianBraun4ntt commented 8 months ago

This issue is for a:

- [x] bug report 
- [ ] feature request
- [ ] documentation issue or request
- [ ] regression (a behavior that used to work and stopped in a new release)

Minimal steps to reproduce

git checkout https://github.com/Azure-Samples/ai-rag-chat-evaluator/

Run python3 -m scripts evaluate --config=example_config.json --numquestion=2

Log messages given by the failure

(INFO) azureml-metrics: [azureml-metrics] ActivityStarted: compute_metrics-qa, ActivityType: ComputeMetrics, CustomDimensions: {'app_name': 'azureml-metrics', 'task_type': 'qa', 'azureml_metrics_run_id': 'XXXXX', 'current_timestamp': 'XXXX'} (WARNING) azureml.metrics.text.qa.azureml_qa_metrics: LLM related metrics need llm_params to be computed. Computing metrics for ['gpt_groundedness', 'gpt_coherence', 'gpt_relevance'] (INFO) azureml.metrics.common._validation: QA metrics debug: {'y_test_length': 2, 'y_pred_length': 2, 'tokenizer_example_output': 'the quick brown fox jumped over the lazy dog', 'regexes_to_ignore': '', 'ignore_case': False, 'ignore_punctuation': False, 'ignore_numbers': False} 0%| | 0/2 [00:00<?, ?it/s] (WARNING) azureml.metrics.common.llm_connector._openai_connector: Computing gpt based metrics failed with the exception : 'charmap' codec can't encode characters in position 6-92: character maps to (ERROR) azureml.metrics.common._scoring: Scoring failed for QA metric gpt_groundedness (ERROR) azureml.metrics.common._scoring: Class: NameError Message: name 'NotFoundError' is not defined ....

Expected/desired behavior

No error int values for metrics "gpt_groundedness", "gpt_coherence" & "gpt_relevance"

OS and Version?

Windows 10

Versions

azureml-metrics[generative-ai]==0.0.43 azure-ai-generative==1.0.0b2 openai==0.28.1

Other Informations

Results in eval_results.jsonl: {"question":"...","answer":"...","context":"...","truth":"...","gpt_groundedness":null,"gpt_coherence":null,"gpt_relevance":null}

All relevant files are in UTF-8

pamelafox commented 8 months ago

I'll try to replicate this today or add more helpful errors. Was this with sample data?

pamelafox commented 8 months ago

Update: I didn't replicate but I did realize I'm not explicitly specifying an encoding of "utf-8" when I call open() in various places. I'll send a PR with that change as it may help, you could try that yourself as well.

pamelafox commented 8 months ago

I've now merged in my change to use encoding=utf-8 everywhere. Could you try that out and see if you're still seeing issues?

KorbinianBraun4ntt commented 8 months ago

Thank you very much for your quick help. It works now 🙂👍

pamelafox commented 8 months ago

Phew! I'll close this, thanks for raising it.

Azure-Samples / ai-rag-chat-evaluator