Closed AnthoSocofer closed 5 months ago
Hi,
I designed the clean_references
function in a way that it handles a typical text in English. I use it after I retrieve the text from the vectordb and before passing the text to the GPT model. I am not sure what type of text you are retrieving from the vectordb but whatever it is, the function is not happy with it. A quick and easy solve would be:
clean_references
function to Chatgpt and ask it to update the function for you so it can clean the text properly.This modification worked well for me:
try:
# Attempt to encode to latin-1, then decode back to utf-8
content = content.encode('latin1').decode('utf-8', 'ignore')
except Exception as e:
content = content.encode('unicode_escape').decode('unicode_escape')
Thanks, Farzad! By the way, great work. This project is quite inspiring.
Hello,
When chatting with the bot, I often encounter this error:
File "C:\Repos\AI_project\Demo\demo_2024_05_02\RAG_GPT_OpenAI\src\utils\chatbot.py", line 60, in respond retrieved_content = ChatBot.clean_references(docs)
File "C:\Repos\AI_project\Demo\demo_2024_05_02\RAG_GPT_OpenAI\src\utils\chatbot.py", line 117, in clean_references content = content.encode('latin1').decode('utf-8', 'ignore')
UnicodeEncodeError: 'latin-1' codec can't encode character '\uf0b7' in position 383: ordinal not in range(256)
Do you happen to know how to solve this issue?