Azure-Samples / ai-rag-chat-evaluator

Tools for evaluation of RAG Chat Apps using Azure AI Evaluate SDK and OpenAI
MIT License
163 stars 59 forks source link

Evaluate script fails after "Starting evaluation...": 'charmap' codec can't encode characters in position 6-10: character maps to <undefined> #38

Closed sofyan-ajridi-ey closed 5 months ago

sofyan-ajridi-ey commented 5 months ago

This issue is for a: (mark with an x)

- [x ] bug report -> please search issues before submitting
- [ ] feature request
- [ ] documentation issue or request
- [ ] regression (a behavior that used to work and stopped in a new release)

Minimal steps to reproduce

Used the azure-search-openai-demo and loaded it with the constitution of the US. App itself works fine. Then I used the generate script to generate the following q/a pairs:

{"question": "What happens if a bill is not returned by the President within ten days, excluding Sundays?", "truth": "If a bill is not returned by the President within ten days (Sundays excepted) after it has been presented to him, it shall become a law in the same manner as if he had signed it.\n[constitution.pdf#page=4]"}
{"question": "What is the exception to this rule regarding a bill becoming a law without the President's signature?", "truth": "The exception to this rule is if Congress adjourns and prevents the bill's return, in which case it shall not become a law.\n[constitution.pdf#page=4]"}
{"question": "What types of orders, resolutions, or votes require the concurrence of both the Senate and House of Representatives?", "truth": "Every order, resolution, or vote that requires the concurrence of both the Senate and House of Representatives, except on a question of adjournment, must be presented to the President of the United States.\n[constitution.pdf#page=4]"}
etc..

When I then try to run the evaluate command, it first sends a test question which goes fine:

2024-02-02 14:54:30 (INFO) scripts: Sending a test question to the target to ensure it is running...
2024-02-02 14:54:39 (INFO) scripts: Successfully received response from target: "question": "What information is in your kn...", "answer": "Our knowledge base includes in...", "context": "constitution.pdf#page=5: by th..."
2024-02-02 14:54:39 (INFO) scripts: Starting evaluation...

But then it fails and I get the following error messages:

2024-02-02 14:57:54 (WARNING) azureml.metrics.text.qa.azureml_qa_metrics: LLM related metrics need llm_params to be computed. Computing metrics for ['gpt_relevance', 'gpt_coherence', 'gpt_groundedness']
2024-02-02 14:57:54 (INFO) azureml.metrics.common._validation: QA metrics debug: {'y_test_length': 20, 'y_pred_length': 20, 'tokenizer_example_output': 'the quick brown fox jumped over the lazy dog', 'regexes_to_ignore': '', 'ignore_case': False, 'ignore_punctuation': False, 'ignore_numbers': False}
  0%|                                                                                           | 0/20 [00:00<?, ?it/s]2024-02-02 14:57:56 (WARNING) azureml.metrics.common.llm_connector._openai_connector: Computing gpt based metrics failed with the exception : 'charmap' codec can't encode characters in position 6-10: character maps to <undefined>
2024-02-02 14:57:56 (ERROR) azureml.metrics.common._scoring: Scoring failed for QA metric gpt_relevance
2024-02-02 14:57:56 (ERROR) azureml.metrics.common._scoring: Class: NameError
Message: name 'NotFoundError' is not defined
  0%|                                                                                           | 0/20 [00:00<?, ?it/s]2024-02-02 14:57:57 (WARNING) azureml.metrics.common.llm_connector._openai_connector: Computing gpt based metrics failed with the exception : 'charmap' codec can't encode characters in position 6-10: character maps to <undefined>
2024-02-02 14:57:57 (ERROR) azureml.metrics.common._scoring: Scoring failed for QA metric gpt_coherence
2024-02-02 14:57:57 (ERROR) azureml.metrics.common._scoring: Class: NameError
Message: name 'NotFoundError' is not defined
  0%|                                                                                           | 0/20 [00:00<?, ?it/s]2

eval_results.jsonl also contains the following:

{"question":"What happens if a bill is not returned by the President within ten days, excluding Sundays?","answer":"If a bill is not returned by the President within ten days, excluding Sundays, it will become a law as if the President had signed it [constitution.pdf#page=4].","context":"constitution.pdf#page=4: shall not be returned by the President within ten Days (Sundays excepted) after it shall have been presented to him, the Same shall be a Law, in like Manner as if he had signed it, unless the Congress by their Adjournment prevent its Return, in which Case it shall not be a Law Every Order, Resolution, or Vote to which the Concur- rence of the Senate and House of Representatives may be necessary (except on a question of Adjournment) shall be presented to the President of the United States; and before the Same shall take Effect, shall be approved by him, or be- ing disapproved by him, shall be repassed by two thirds of the Senate and House of Representatives, according to the Rules and Limitations prescribed in the Case of a Bill. SECTION. 8 The Congress shall have Power To lay and collect Taxes, Duties, Imposts and Excises, to pay the Debts and provide for the common Defence and general Welfare of the United States; but all Duties, Imposts and Excises shall be uniform throughout the United States; To borrow Money on the credit of the United States; To regulate Commerce with foreign Nations, and among the several States, and with the Indian Tribes; To establish an uniform Rule of \n\nconstitution.pdf#page=4: originate in the House of Representatives; but the Senate may propose or concur with Amendments as on other Bills Every Bill which shall have passed the House of Represen- tatives and the Senate, shall, before it become a Law, be presented to the President of the United States; If he ap- prove he shall sign it, but if not he shall return it, with his Objections to that House in which it shall have originated, who shall enter the Objections at large on their Journal, and proceed to reconsider it. If after such Reconsideration two thirds of that House shall agree to pass the Bill, it shall be sent, together with the Objections, to the other House, by which it shall likewise be reconsidered, and if approved by two thirds of that House, it shall become a Law. But in all such Cases the Votes of both Houses shall be determined by Yeas and Nays, and the Names of the Persons voting for and against the Bill shall be entered on the Journal of each House respectively, If any Bill shall not be returned by the President within ten Days (Sundays excepted) after it shall have been presented to him, the Same shall be a Law, in like Manner as if he had signed it, unless the Congress by their \n\nconstitution.pdf#page=18: written declaration that the President is unable to discharge the powers and duties of his office, the Vice President shall immediately assume the powers and duties of the office as Acting President. Thereafter, when the President transmits to the President pro tempore of the Senate and the Speaker of the House of Representatives his written declaration that no inability ex- ists, he shall resume the powers and duties of his office un- less the Vice President and a majority of either the principal officers of the executive department or of such other body as Congress may by law provide, transmit within four days to the President pro tempore of the Senate and the Speaker of the House of Representatives their written declaration that the President is unable to discharge the powers and duties of his office. Thereupon Congress shall decide the issue, assembling within forty-eight hours for that purpose if not in session. If the Congress, within twenty-one days after receipt of the latter written declaration, or, if Congress is not in session, within twenty-one days after Congress is required to assemble, determines by two-thirds vote of both Houses that the President is unable to ","truth":"If a bill is not returned by the President within ten days (Sundays excepted) after it has been presented to him, it shall become a law in the same manner as if he had signed it, unless Congress adjourns and prevents its return, in which case it shall not become a law.\n[constitution.pdf#page=4]","gpt_relevance":null,"gpt_coherence":null,"gpt_groundedness":null}
{"question":"What is the process for an order, resolution, or vote to take effect, which requires the concurrence of the Senate and House of Representatives?","answer":"The process for an order, resolution, or vote to take effect, which requires the concurrence of the Senate and House of Representatives, is as follows:\n\n1. The order, resolution, or vote is presented to the President of the United States.\n2. The President must approve the order, resolution, or vote for it to take effect.\n3. If the President disapproves of the order, resolution, or vote, it must be repassed by two-thirds of the Senate and House of Representatives.\n4. Once the order, resolution, or vote is approved by the President or repassed by two-thirds of Congress, it takes effect.\n\n[constitution.pdf#page=4][constitution.pdf#page=13]","context":"constitution.pdf#page=4: shall not be returned by the President within ten Days (Sundays excepted) after it shall have been presented to him, the Same shall be a Law, in like Manner as if he had signed it, unless the Congress by their Adjournment prevent its Return, in which Case it shall not be a Law Every Order, Resolution, or Vote to which the Concur- rence of the Senate and House of Representatives may be necessary (except on a question of Adjournment) shall be presented to the President of the United States; and before the Same shall take Effect, shall be approved by him, or be- ing disapproved by him, shall be repassed by two thirds of the Senate and House of Representatives, according to the Rules and Limitations prescribed in the Case of a Bill. SECTION. 8 The Congress shall have Power To lay and collect Taxes, Duties, Imposts and Excises, to pay the Debts and provide for the common Defence and general Welfare of the United States; but all Duties, Imposts and Excises shall be uniform throughout the United States; To borrow Money on the credit of the United States; To regulate Commerce with foreign Nations, and among the several States, and with the Indian Tribes; To establish an uniform Rule of \n\nconstitution.pdf#page=13: of the United States, directed to the President of the Senate ;- the President of the Senate shall, in the presence of the Senate and House of Represen- tatives, open all the certificates and the votes shall then be counted ;- The person having the greatest number of votes for President, shall be the President, if such number be a majority of the whole number of Electors appointed; and if no person have such majority, then from the persons having the highest numbers not exceeding three on the list of those voted for as President, the House of Representatives shall choose immediately, by ballot, the President. But in choos- ing the President, the votes shall be taken by states, the representation from each state having one vote; a quorum for this purpose shall consist of a member or members from two-thirds of the states, and a majority of all the states shall be necessary to a choice. [And if the House of Representa- tives shall not choose a President whenever the right of choice shall devolve upon them, before the fourth day of March next following, then the Vice-President shall act as President, as in case of the death or other constitutional disability of the President .\n\nconstitution.pdf#page=11: for the sole Purpose of receiving, opening and counting the Votes for President; and, that after he shall be chosen, the Congress, together with the President, should, without Delay, proceed to execute this<\/td><\/tr><tr><td>Constitution<\/td><\/tr><tr><td rowSpan=3><\/td><td>By the unanimous Order of the Convention<\/td><\/tr><tr><td>Go. Washington-Presidt:<\/td><\/tr><tr><td>W. JACKSON Secretary.<\/td><\/tr><\/table>  * Language in brackets has been changed by amendment. CONSTITUTION OF THE UNITED STATES THE AMENDMENTS TO THE CONSTITUTION OF THE UNITED STATES AS RATIFIED BY THE STATES Preamble to the Bill of Rights Congress of the United States begun and held at the City of New-York, on Wednesday the fourth of March, THE Conventions of a number of the States, having at the time of their adopting the Constitution, expressed a desire, in order to prevent misconstruction or abuse of its powers, that further declaratory and restrictive clauses should be added: And as extending the ground of public confidence in the Government, will best ensure the beneficent ends of its institution RESOLVED by the Senate and House of Representatives of the United States of America, in Congress assembled, two ","truth":"An order, resolution, or vote that requires the concurrence of the Senate and House of Representatives must be presented to the President of the United States. It shall take effect only if approved by the President, or if disapproved, it must be repassed by two-thirds of the Senate and House of Representatives according to the rules and limitations prescribed in the case of a bill.\n[constitution.pdf#page=4]","gpt_relevance":null,"gpt_coherence":null,"gpt_groundedness":null}

Any log messages given by the failure

See above

Expected/desired behavior

Should generate a result for the evaluation

OS and Version?

Windows 10

Versions

Mention any other details that might be useful


Thanks! We'll be in touch soon.

pamelafox commented 5 months ago

Can you confirm you have the most recent version of this repo? I added "encoding=utf-8" to all the file open calls to fix a similar issue, and want to make sure that's the version you're using.

sofyan-ajridi-ey commented 5 months ago

I just double checked: everything is up to date including "encoding=utf-8" in all the open() calls

pamelafox commented 5 months ago

Hm. I've tested with the sample data you gave here, and am unable to replicate the error. Does the error happen if you only do the first two questions? Or does it happen on a later question? I'm trying to figure out if the issue is with the encoding/characters of your input data or off the target endpoint response.

lgong-rms commented 5 months ago

got the same issue and posted the messages here: https://github.com/Azure-Samples/ai-rag-chat-evaluator/issues/32#issuecomment-1925417941

KorbinianBraun4ntt commented 5 months ago

Just check if your Dev Container runs properly, that fixed my problem together with the specification of UTF-8. this is a compatibility problem with Windows and linux. (WSL should work)

sofyan-ajridi-ey commented 5 months ago

Hm. I've tested with the sample data you gave here, and am unable to replicate the error. Does the error happen if you only do the first two questions? Or does it happen on a later question? I'm trying to figure out if the issue is with the encoding/characters of your input data or off the target endpoint response.

It happens instantly (I assume after the first question). I tested it out with another set of questions and I have the same issue.

sofyan-ajridi-ey commented 5 months ago

Just check if your Dev Container runs properly, that fixed my problem together with the specification of UTF-8. this is a compatibility problem with Windows and linux. (WSL should work)

Would like to try it out, but working on a corporate pc without docker license

pamelafox commented 5 months ago

Could any of you try adding this line after requests.post() in evaluate.py?

        r = requests.post(target_url, headers=headers, json=body)
        r.encoding = "utf-8"

It explicitly sets the encoding to UTF-8. I'm hoping the issue is that it's detecting a different encoding on Windows, and we just need to override that to specify UTF-8. Unfortunately I'm on a Mac so I have yet to be able to replicate it personally.

sofyan-ajridi-ey commented 5 months ago

Sadly, that didn't fix it. Same error: 024-02-05 12:58:17 (WARNING) azureml.metrics.common.llm_connector._openai_connector: Computing gpt based metrics failed with the exception : 'charmap' codec can't encode characters in position 6-47: character maps to <undefined> 2024-02-05 12:58:17 (ERROR) azureml.metrics.common._scoring: Scoring failed for QA metric gpt_coherence 2024-02-05 12:58:17 (ERROR) azureml.metrics.common._scoring: Class: NameError Message: name 'NotFoundError' is not defined

kuppan4109 commented 5 months ago

Am also facing the same issue.

OS Type: Windows IDE : Visual Studio Code Python Version : 3.10

image

Executing the script locally without the container - Both my RAG Chat service and evaluation script runs in my local

Command: python -m scripts evaluate --config=example_config.json --numquestions=2

KorbinianBraun4ntt commented 5 months ago

Am also facing the same issue.

OS Type: Windows IDE : Visual Studio Code Python Version : 3.10

image

Executing the script locally without the container - Both my RAG Chat service and evaluation script runs in my local

Command: python -m scripts evaluate --config=example_config.json --numquestions=2

You must run the RAG EVAL scripts in a dev container (Ubuntu or similar). Otherwise it will not work. The easiest way is to pull the repository directly via VS Code, a dev container should be started automatically. Otherwise, start a container directly and execute the evaluation there.

pamelafox commented 5 months ago

Okay, I would like to get this working outside of a dev container as well, so I will see if I can work with a colleague with a Windows machine to find a fix.

kuppan4109 commented 5 months ago

Okay, I would like to get this working outside of a dev container as well, so I will see if I can work with a colleague with a Windows machine to find a fix.

Hey @pamelafox , Any updates on this ?

pamelafox commented 5 months ago

I now have a Windows machine! I'm working on replicating the issue now.

pamelafox commented 5 months ago

Okay, so I replicated the encoding error, and then I merged my most recent PR that upgraded the azure-ai-generative SDK, and now I no longer see the error. Can you all try the latest main and see if it's working for you?

sofyan-ajridi-ey commented 5 months ago

Good news, latest PR fixes the issue on my end! Thank you for your quick response.

pamelafox commented 5 months ago

Phew! Closing this. Thanks for confirming!

kuppan4109 commented 5 months ago

@pamelafox @sofyan-ajridi-ey - Latest merge got my issue fixed , Thanks Guys

Niharika6442 commented 5 months ago

This issue is fixed on my end as well. Thanks @pamelafox