Closed Niharika6442 closed 4 months ago
2024-01-25 13:04:48 (INFO) scripts: Starting evaluation... Fail writing properties '{'_azureml.evaluation_run': 'azure-ai-generative-parent'}' to run history: 'FileStore' object has no attribute 'get_host_creds'
024-01-25 19:23:06 (WARNING) azureml.metrics.common.llm_connector._openai_connector: Computing gpt based metrics failed with the exception : 'charmap' codec can't encode characters in position 6-117: character maps to
The get_host_creds error always shows and can be ignored, I've asked the team about removing that.
I think there is an actual error in your output though: " 'charmap' codec can't encode characters in position 6-117"
I'm wondering if there are characters in your input that it isn't handling well. Are you testing non-English languages or emojis or some such?
I'm actually trying to evaluate azure-search-openai-demo.
In service_setup.py, I'm having issues in configuring an already deployed API. Sample details: **"target_url": "https://app-backend-j25rgqsibtmlo.azurewebsites.net/chat" AZURE_OPENAI_SERVICE = cog-io*****4 AZURE_OPENAI_EVAL_DEPLOYMENT="chat"** How can I make changes to below code? "api_type": api_type, "api_base": f"https://{os.environ['AZURE_OPENAI_SERVICE']}.openai.azure.com", "api_key": api_key, "api_version": "2023-07-01-preview", "deployment_id": os.environ["AZURE_OPENAI_EVAL_DEPLOYMENT"], "model": os.environ["OPENAI_GPT_MODEL"],
Exact error : Computing gpt based metrics failed with the exception : HTTP code 404 from API (<!doctype html>
The requested URL was not found on the server. If you entered the URL manually please check your spelling and try again.
)was following the instruction in the repo and got the same error:
$ python -m scripts evaluate --config=example_config.json --numquestions=2
2024-02-03 09:58:29 (INFO) scripts: Running evaluation from config D:\git\ai-rag-chat-evaluator\example_config.json
2024-02-03 09:58:29 (INFO) scripts: Replaced results_dir in config with timestamp
2024-02-03 09:58:29 (INFO) scripts: Replaced prompt_template in config with contents of example_input/prompt_refined.txt
2024-02-03 09:58:29 (INFO) scripts: Using Azure OpenAI Service with API Key from AZURE_OPENAI_KEY
2024-02-03 09:58:29 (INFO) scripts: Running evaluation using data from D:\git\ai-rag-chat-evaluator\example_input\qa.jsonl
2024-02-03 09:58:29 (INFO) scripts: Limiting evaluation to 2 questions
2024-02-03 09:58:29 (INFO) scripts: Sending a test question to the target to ensure it is running...
2024-02-03 09:58:35 (INFO) scripts: Successfully received response from target: "question": "What information is in your kn...", "answer": "In our knowledge base, we have...", "context": "Northwind_Standard_Benefits_De..."
2024-02-03 09:58:35 (INFO) scripts: Starting evaluation...
Fail writing properties '{'_azureml.evaluation_run': 'azure-ai-generative-parent'}' to run history: 'FileStore' object has no attribute 'get_host_creds'
2024-02-03 09:58:43 (INFO) azureml-metrics: Setting max_concurrent_requests to 4 for computing GPT based question answering metrics
2024-02-03 09:58:43 (INFO) azureml-metrics: [azureml-metrics] ActivityStarted: compute_metrics-qa, ActivityType: ComputeMetrics, CustomDimensions: {'app_name': 'azureml-metrics', 'task_type': 'qa', 'azureml_metrics_run_id': '80c3c42a-d95d-44d3-8f4d-da49754ed5ea', 'current_timestamp': '2024-02-03 17:58:43'}
2024-02-03 09:58:43 (WARNING) azureml.metrics.text.qa.azureml_qa_metrics: LLM related metrics need llm_params to be computed. Computing metrics for ['gpt_coherence', 'gpt_groundedness', 'gpt_relevance']
2024-02-03 09:58:43 (INFO) azureml.metrics.common._validation: QA metrics debug: {'y_test_length': 2, 'y_pred_length': 2, 'tokenizer_example_output': 'the quick brown fox jumped over the lazy dog', 'regexes_to_ignore': '', 'ignore_case': False, 'ignore_punctuation': False, 'ignore_numbers': False}
0%| | 0/2 [00:00<?, ?it/s]2024-02-03 09:58:44 (WARNING) azureml.metrics.common.llm_connector._openai_connector: Computing gpt based metrics failed with the exception : 'charmap' codec can't encode characters in position 6-76: character maps to <undefined>
2024-02-03 09:58:44 (ERROR) azureml.metrics.common._scoring: Scoring failed for QA metric gpt_coherence
2024-02-03 09:58:44 (ERROR) azureml.metrics.common._scoring: Class: NameError
Message: name 'NotFoundError' is not defined
0%| | 0/2 [00:00<?, ?it/s2
024-02-03 09:58:45 (WARNING) azureml.metrics.common.llm_connector._openai_connector: Computing gpt based metrics failed with the exception : 'charmap' codec can't encode characters in position 6-76: character maps to <undefined>
2024-02-03 09:58:45 (ERROR) azureml.metrics.common._scoring: Scoring failed for QA metric gpt_groundedness
2024-02-03 09:58:45 (ERROR) azureml.metrics.common._scoring: Class: NameError
Message: name 'NotFoundError' is not defined
0%| | 0/2 [00:00<?, ?it/s2
024-02-03 09:58:46 (WARNING) azureml.metrics.common.llm_connector._openai_connector: Computing gpt based metrics failed with the exception : 'charmap' codec can't encode characters in position 6-76: character maps to <undefined>
2024-02-03 09:58:46 (ERROR) azureml.metrics.common._scoring: Scoring failed for QA metric gpt_relevance
2024-02-03 09:58:46 (ERROR) azureml.metrics.common._scoring: Class: NameError
Message: name 'NotFoundError' is not defined
C:\Users\lgong\Anaconda3\envs\py311\Lib\site-packages\azureml\metrics\common\utilities.py:293: RuntimeWarning: Mean of empty slice
metrics_result[constants.Metric.Metrics][mean_metric_name] = np.nanmean(metric_value)
C:\Users\lgong\Anaconda3\envs\py311\Lib\site-packages\azureml\metrics\common\utilities.py:294: RuntimeWarning: All-NaN slice encountered
metrics_result[constants.Metric.Metrics][median_metric_name] = np.nanmedian(metric_value)
2024-02-03 09:58:46 (INFO) azureml-metrics: [azureml-metrics] ActivityCompleted: Activity=compute_metrics-qa, HowEnded=SUCCESS, Duration=3163.11[ms]
Fail writing properties '{'_azureml.evaluate_artifacts': '[{"path": "eval_results.jsonl", "type": "table"}]'}' to run history: 'FileStore' object has no attribute 'get_host_creds'
2024-02-03 09:58:46 (INFO) scripts: Evaluation calls have completed. Calculating overall metrics now...
Traceback (most recent call last):
File "<frozen runpy>", line 198, in _run_module_as_main
File "<frozen runpy>", line 88, in _run_code
File "D:\git\ai-rag-chat-evaluator\scripts\__main__.py", line 6, in <module>
app()
File "C:\Users\lgong\Anaconda3\envs\py311\Lib\site-packages\typer\main.py", line 328, in __call__
raise e
File "C:\Users\lgong\Anaconda3\envs\py311\Lib\site-packages\typer\main.py", line 311, in __call__
return get_command(self)(*args, **kwargs)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "C:\Users\lgong\Anaconda3\envs\py311\Lib\site-packages\click\core.py", line 1157, in __call__
return self.main(*args, **kwargs)
^^^^^^^^^^^^^^^^^^^^^^^^^^
File "C:\Users\lgong\Anaconda3\envs\py311\Lib\site-packages\typer\core.py", line 778, in main
return _main(
^^^^^^
File "C:\Users\lgong\Anaconda3\envs\py311\Lib\site-packages\typer\core.py", line 216, in _main
rv = self.invoke(ctx)
^^^^^^^^^^^^^^^^
File "C:\Users\lgong\Anaconda3\envs\py311\Lib\site-packages\click\core.py", line 1688, in invoke
return _process_result(sub_ctx.command.invoke(sub_ctx))
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "C:\Users\lgong\Anaconda3\envs\py311\Lib\site-packages\click\core.py", line 1434, in invoke
return ctx.invoke(self.callback, **ctx.params)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "C:\Users\lgong\Anaconda3\envs\py311\Lib\site-packages\click\core.py", line 783, in invoke
return __callback(*args, **kwargs)
^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "C:\Users\lgong\Anaconda3\envs\py311\Lib\site-packages\typer\main.py", line 683, in wrapper
return callback(**use_params) # type: ignore
^^^^^^^^^^^^^^^^^^^^^^
File "D:\git\ai-rag-chat-evaluator\scripts\cli.py", line 27, in evaluate
run_evaluate_from_config(Path.cwd(), config, numquestions)
File "D:\git\ai-rag-chat-evaluator\scripts\evaluate.py", line 197, in run_evaluate_from_config
evaluation_run_complete = run_evaluation(
^^^^^^^^^^^^^^^
File "D:\git\ai-rag-chat-evaluator\scripts\evaluate.py", line 138, in run_evaluation
if passes_threshold(question_with_rating[metric_name]):
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "D:\git\ai-rag-chat-evaluator\scripts\evaluate.py", line 130, in passes_threshold
return int(rating) >= 4
^^^^^^^^^^^
TypeError: int() argument must be a string, a bytes-like object or a real number, not 'NoneType'
Exception ignored in: <coroutine object get_async_chat_completion at 0x000002070645ABD0>
Traceback (most recent call last):
File "C:\Users\lgong\Anaconda3\envs\py311\Lib\site-packages\azureml\metrics\common\llm_connector\async_utils.py", line 36, in get_async_chat_completion
chat_completion_resp = await openai.ChatCompletion.acreate(**kwargs)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "C:\Users\lgong\Anaconda3\envs\py311\Lib\site-packages\openai\api_resources\chat_completion.py", line 45, in acreate
return await super().acreate(*args, **kwargs)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "C:\Users\lgong\Anaconda3\envs\py311\Lib\site-packages\openai\api_resources\abstract\engine_api_resource.py", line 219, in acreate
response, _, api_key = await requestor.arequest(
^^^^^^^^^^^^^^^^^^^^^^^^^
RuntimeError: coroutine ignored GeneratorExit
2024-02-03 09:58:46 (ERROR) asyncio: Task was destroyed but it is pending!
task: <Task pending name='Task-2' coro=<tqdm_asyncio.gather.<locals>.wrap_awaitable() done, defined at C:\Users\lgong\Anaconda3\envs\py311\Lib\site-packages\tqdm\asyncio.py:75>
wait_for=<Future pending cb=[Task.__wakeup()]> cb=[as_completed.<locals>._on_completion() at C:\Users\lgong\Anaconda3\envs\py311\Lib\asyncio\tasks.py:602]>
2024-02-03 09:58:46 (ERROR) asyncio: Task was destroyed but it is pending!
task: <Task pending name='Task-9' coro=<tqdm_asyncio.gather.<locals>.wrap_awaitable() running at C:\Users\lgong\Anaconda3\envs\py311\Lib\site-packages\tqdm\asyncio.py:76> wait_for=<Future pending cb=[Task.__wakeup()]> cb=[as_completed.<locals>._on_completion() at C:\Users\lgong\Anaconda3\envs\py311\Lib\asyncio\tasks.py:602]>
2024-02-03 09:58:46 (ERROR) asyncio: Unclosed client session
client_session: <aiohttp.client.ClientSession object at 0x00000207064F7690>
(py311)
Did you get that error on the sample data or on new data? What operating system and Python version are you running the script from?
got the error on sample data and on windows with Python 3.11.7.
Even I have the same issue. I am using the latest updated repo.
OS: windows python 3.11.7
The 'get_host_creds' error is not an actual error that should affect the script working, and I've asked the azure-ai-generative team to remove it.
However, if you were experiencing the charmap encoding issue on Windows, please try pulling the latest main and seeing if the new version works for you.
Running through an evaluation error