Niharika6442 commented 5 months ago

Running through an evaluation error

Niharika6442 commented 5 months ago

2024-01-25 13:04:48 (INFO) scripts: Starting evaluation... Fail writing properties '{'_azureml.evaluation_run': 'azure-ai-generative-parent'}' to run history: 'FileStore' object has no attribute 'get_host_creds'

024-01-25 19:23:06 (WARNING) azureml.metrics.common.llm_connector._openai_connector: Computing gpt based metrics failed with the exception : 'charmap' codec can't encode characters in position 6-117: character maps to

pamelafox commented 5 months ago

The get_host_creds error always shows and can be ignored, I've asked the team about removing that.

I think there is an actual error in your output though: " 'charmap' codec can't encode characters in position 6-117"

I'm wondering if there are characters in your input that it isn't handling well. Are you testing non-English languages or emojis or some such?

Niharika6442 commented 5 months ago

I'm actually trying to evaluate azure-search-openai-demo.

In service_setup.py, I'm having issues in configuring an already deployed API. Sample details: **"target_url": "https://app-backend-j25rgqsibtmlo.azurewebsites.net/chat" AZURE_OPENAI_SERVICE = cog-io*****4 AZURE_OPENAI_EVAL_DEPLOYMENT="chat"** How can I make changes to below code? "api_type": api_type, "api_base": f"https://{os.environ['AZURE_OPENAI_SERVICE']}.openai.azure.com", "api_key": api_key, "api_version": "2023-07-01-preview", "deployment_id": os.environ["AZURE_OPENAI_EVAL_DEPLOYMENT"], "model": os.environ["OPENAI_GPT_MODEL"],

Exact error : Computing gpt based metrics failed with the exception : HTTP code 404 from API (<!doctype html>

404 Not Found

Not Found

The requested URL was not found on the server. If you entered the URL manually please check your spelling and try again.

)

lgong-rms commented 5 months ago

was following the instruction in the repo and got the same error:

$ python -m scripts evaluate --config=example_config.json --numquestions=2
2024-02-03 09:58:29 (INFO) scripts: Running evaluation from config D:\git\ai-rag-chat-evaluator\example_config.json
2024-02-03 09:58:29 (INFO) scripts: Replaced results_dir in config with timestamp
2024-02-03 09:58:29 (INFO) scripts: Replaced prompt_template in config with contents of example_input/prompt_refined.txt
2024-02-03 09:58:29 (INFO) scripts: Using Azure OpenAI Service with API Key from AZURE_OPENAI_KEY
2024-02-03 09:58:29 (INFO) scripts: Running evaluation using data from D:\git\ai-rag-chat-evaluator\example_input\qa.jsonl
2024-02-03 09:58:29 (INFO) scripts: Limiting evaluation to 2 questions
2024-02-03 09:58:29 (INFO) scripts: Sending a test question to the target to ensure it is running...
2024-02-03 09:58:35 (INFO) scripts: Successfully received response from target: "question": "What information is in your kn...", "answer": "In our knowledge base, we have...", "context": "Northwind_Standard_Benefits_De..."
2024-02-03 09:58:35 (INFO) scripts: Starting evaluation...
Fail writing properties '{'_azureml.evaluation_run': 'azure-ai-generative-parent'}' to run history: 'FileStore' object has no attribute 'get_host_creds'
2024-02-03 09:58:43 (INFO) azureml-metrics: Setting max_concurrent_requests to 4 for computing GPT based question answering metrics
2024-02-03 09:58:43 (INFO) azureml-metrics: [azureml-metrics] ActivityStarted: compute_metrics-qa, ActivityType: ComputeMetrics, CustomDimensions: {'app_name': 'azureml-metrics', 'task_type': 'qa', 'azureml_metrics_run_id': '80c3c42a-d95d-44d3-8f4d-da49754ed5ea', 'current_timestamp': '2024-02-03 17:58:43'}
2024-02-03 09:58:43 (WARNING) azureml.metrics.text.qa.azureml_qa_metrics: LLM related metrics need llm_params to be computed. Computing metrics for ['gpt_coherence', 'gpt_groundedness', 'gpt_relevance']
2024-02-03 09:58:43 (INFO) azureml.metrics.common._validation: QA metrics debug: {'y_test_length': 2, 'y_pred_length': 2, 'tokenizer_example_output': 'the quick brown fox jumped over the lazy dog', 'regexes_to_ignore': '', 'ignore_case': False, 'ignore_punctuation': False, 'ignore_numbers': False}
  0%|                                                                                                                                                     | 0/2 [00:00<?, ?it/s]2024-02-03 09:58:44 (WARNING) azureml.metrics.common.llm_connector._openai_connector: Computing gpt based metrics failed with the exception : 'charmap' codec can't encode characters in position 6-76: character maps to <undefined>
2024-02-03 09:58:44 (ERROR) azureml.metrics.common._scoring: Scoring failed for QA metric gpt_coherence
2024-02-03 09:58:44 (ERROR) azureml.metrics.common._scoring: Class: NameError
Message: name 'NotFoundError' is not defined
  0%|                                                                                                                                                     | 0/2 [00:00<?, ?it/s2 
024-02-03 09:58:45 (WARNING) azureml.metrics.common.llm_connector._openai_connector: Computing gpt based metrics failed with the exception : 'charmap' codec can't encode characters in position 6-76: character maps to <undefined>
2024-02-03 09:58:45 (ERROR) azureml.metrics.common._scoring: Scoring failed for QA metric gpt_groundedness
2024-02-03 09:58:45 (ERROR) azureml.metrics.common._scoring: Class: NameError
Message: name 'NotFoundError' is not defined
  0%|                                                                                                                                                     | 0/2 [00:00<?, ?it/s2 
024-02-03 09:58:46 (WARNING) azureml.metrics.common.llm_connector._openai_connector: Computing gpt based metrics failed with the exception : 'charmap' codec can't encode characters in position 6-76: character maps to <undefined>
2024-02-03 09:58:46 (ERROR) azureml.metrics.common._scoring: Scoring failed for QA metric gpt_relevance
2024-02-03 09:58:46 (ERROR) azureml.metrics.common._scoring: Class: NameError
Message: name 'NotFoundError' is not defined
C:\Users\lgong\Anaconda3\envs\py311\Lib\site-packages\azureml\metrics\common\utilities.py:293: RuntimeWarning: Mean of empty slice
  metrics_result[constants.Metric.Metrics][mean_metric_name] = np.nanmean(metric_value)
C:\Users\lgong\Anaconda3\envs\py311\Lib\site-packages\azureml\metrics\common\utilities.py:294: RuntimeWarning: All-NaN slice encountered
  metrics_result[constants.Metric.Metrics][median_metric_name] = np.nanmedian(metric_value)
2024-02-03 09:58:46 (INFO) azureml-metrics: [azureml-metrics] ActivityCompleted: Activity=compute_metrics-qa, HowEnded=SUCCESS, Duration=3163.11[ms]
Fail writing properties '{'_azureml.evaluate_artifacts': '[{"path": "eval_results.jsonl", "type": "table"}]'}' to run history: 'FileStore' object has no attribute 'get_host_creds'
2024-02-03 09:58:46 (INFO) scripts: Evaluation calls have completed. Calculating overall metrics now...
Traceback (most recent call last):
  File "<frozen runpy>", line 198, in _run_module_as_main
  File "<frozen runpy>", line 88, in _run_code
  File "D:\git\ai-rag-chat-evaluator\scripts\__main__.py", line 6, in <module>
    app()
  File "C:\Users\lgong\Anaconda3\envs\py311\Lib\site-packages\typer\main.py", line 328, in __call__
    raise e
  File "C:\Users\lgong\Anaconda3\envs\py311\Lib\site-packages\typer\main.py", line 311, in __call__
    return get_command(self)(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "C:\Users\lgong\Anaconda3\envs\py311\Lib\site-packages\click\core.py", line 1157, in __call__
    return self.main(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "C:\Users\lgong\Anaconda3\envs\py311\Lib\site-packages\typer\core.py", line 778, in main
    return _main(
           ^^^^^^
  File "C:\Users\lgong\Anaconda3\envs\py311\Lib\site-packages\typer\core.py", line 216, in _main
    rv = self.invoke(ctx)
         ^^^^^^^^^^^^^^^^
  File "C:\Users\lgong\Anaconda3\envs\py311\Lib\site-packages\click\core.py", line 1688, in invoke
    return _process_result(sub_ctx.command.invoke(sub_ctx))
                           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "C:\Users\lgong\Anaconda3\envs\py311\Lib\site-packages\click\core.py", line 1434, in invoke
    return ctx.invoke(self.callback, **ctx.params)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "C:\Users\lgong\Anaconda3\envs\py311\Lib\site-packages\click\core.py", line 783, in invoke
    return __callback(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "C:\Users\lgong\Anaconda3\envs\py311\Lib\site-packages\typer\main.py", line 683, in wrapper
    return callback(**use_params)  # type: ignore
           ^^^^^^^^^^^^^^^^^^^^^^
  File "D:\git\ai-rag-chat-evaluator\scripts\cli.py", line 27, in evaluate
    run_evaluate_from_config(Path.cwd(), config, numquestions)
  File "D:\git\ai-rag-chat-evaluator\scripts\evaluate.py", line 197, in run_evaluate_from_config
    evaluation_run_complete = run_evaluation(
                              ^^^^^^^^^^^^^^^
  File "D:\git\ai-rag-chat-evaluator\scripts\evaluate.py", line 138, in run_evaluation
    if passes_threshold(question_with_rating[metric_name]):
       ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "D:\git\ai-rag-chat-evaluator\scripts\evaluate.py", line 130, in passes_threshold
    return int(rating) >= 4
           ^^^^^^^^^^^
TypeError: int() argument must be a string, a bytes-like object or a real number, not 'NoneType'
Exception ignored in: <coroutine object get_async_chat_completion at 0x000002070645ABD0>
Traceback (most recent call last):
  File "C:\Users\lgong\Anaconda3\envs\py311\Lib\site-packages\azureml\metrics\common\llm_connector\async_utils.py", line 36, in get_async_chat_completion
    chat_completion_resp = await openai.ChatCompletion.acreate(**kwargs)
                           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "C:\Users\lgong\Anaconda3\envs\py311\Lib\site-packages\openai\api_resources\chat_completion.py", line 45, in acreate
    return await super().acreate(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "C:\Users\lgong\Anaconda3\envs\py311\Lib\site-packages\openai\api_resources\abstract\engine_api_resource.py", line 219, in acreate
    response, _, api_key = await requestor.arequest(
                           ^^^^^^^^^^^^^^^^^^^^^^^^^
RuntimeError: coroutine ignored GeneratorExit
2024-02-03 09:58:46 (ERROR) asyncio: Task was destroyed but it is pending!
task: <Task pending name='Task-2' coro=<tqdm_asyncio.gather.<locals>.wrap_awaitable() done, defined at C:\Users\lgong\Anaconda3\envs\py311\Lib\site-packages\tqdm\asyncio.py:75> 
wait_for=<Future pending cb=[Task.__wakeup()]> cb=[as_completed.<locals>._on_completion() at C:\Users\lgong\Anaconda3\envs\py311\Lib\asyncio\tasks.py:602]>
2024-02-03 09:58:46 (ERROR) asyncio: Task was destroyed but it is pending!
task: <Task pending name='Task-9' coro=<tqdm_asyncio.gather.<locals>.wrap_awaitable() running at C:\Users\lgong\Anaconda3\envs\py311\Lib\site-packages\tqdm\asyncio.py:76> wait_for=<Future pending cb=[Task.__wakeup()]> cb=[as_completed.<locals>._on_completion() at C:\Users\lgong\Anaconda3\envs\py311\Lib\asyncio\tasks.py:602]>
2024-02-03 09:58:46 (ERROR) asyncio: Unclosed client session
client_session: <aiohttp.client.ClientSession object at 0x00000207064F7690>
(py311)

pamelafox commented 5 months ago

Did you get that error on the sample data or on new data? What operating system and Python version are you running the script from?

lgong-rms commented 5 months ago

got the error on sample data and on windows with Python 3.11.7.

Niharika6442 commented 5 months ago

Even I have the same issue. I am using the latest updated repo.

OS: windows python 3.11.7

pamelafox commented 5 months ago

The 'get_host_creds' error is not an actual error that should affect the script working, and I've asked the azure-ai-generative team to remove it.

However, if you were experiencing the charmap encoding issue on Windows, please try pulling the latest main and seeing if the new version works for you.

Azure-Samples / ai-rag-chat-evaluator

'FileStore' object has no attribute 'get_host_creds' #32

Not Found