alejandro-ao / ask-multiple-pdfs

A Langchain app that allows you to chat with multiple PDFs
1.63k stars 933 forks source link

UnicodeEncodeError: 'latin-1' codec can't encode character '\u03c0' in position 13: ordinal not in range(256) #6

Open Dimayakoub opened 1 year ago

Dimayakoub commented 1 year ago

When trying to run the app on windows environment, im receiving this error, even when having a PDF file with 2 words only:

File "D:\Python\Lib\site-packages\streamlit\runtime\scriptrunner\script_runner.py", line 565, in _run_script exec(code, module.dict) File "C:\Users\Ahmad Issmail\Desktop\TST\ask-multiple-pdfs-main\app.py", line 105, in main() File "C:\Users\Ahmad Issmail\Desktop\TST\ask-multiple-pdfs-main\app.py", line 97, in main vectorstore = get_vectorstore(text_chunks) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "C:\Users\Ahmad Issmail\Desktop\TST\ask-multiple-pdfs-main\app.py", line 36, in get_vectorstore vectorstore = FAISS.from_texts(texts=text_chunks, embedding=embeddings) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

aabalke33 commented 1 year ago

I think you'll need to convert to a UTF-8 encoding. \u03c0 looks like a UTF-16 encoding which may confuse something in the workflow. See the following: https://stackoverflow.com/questions/3942888/unicodeencodeerror-latin-1-codec-cant-encode-character/12064483#12064483