Cinnamon / kotaemon

An open-source RAG-based tool for chatting with your documents.
https://cinnamon.github.io/kotaemon/
Apache License 2.0
13.9k stars 1.04k forks source link

[BUG] GraphRAG issue: Error: 'charmap' codec can't encode character '\u25e6' in position 959: character maps to <undefined> #373

Open vigkastHV opened 2 days ago

vigkastHV commented 2 days ago

Description

System - Windows 11 Python - 3.11.9

GraphRAG gives following error while uploading file and there are no error logs in command-prompt.

Error - Error: 'charmap' codec can't encode character '\u25e6' in position 959: character maps to <undefined>

Reproduction steps

1. git clone https://github.com/Cinnamon/kotaemon && cd kotaemon
2. pip install graphrag
3. pip install -e "libs/kotaemon[all]"
4. pip install -e "libs/ktem"
5. python app.py

Error is visible only in browser not in command

Screenshots

Browser screenshot of error -

Kotaemon-Error

Logs

$ python app.py
WARNING: All log messages before absl::InitializeLog() is called are written to STDERR
I0000 00:00:1728460560.732284   11824 config.cc:230] gRPC experiments enabled: call_status_override_on_cancellation, event_engine_client, event_engine_dns, event_engine_listener, http2_stats_fix, monitoring_experiment, pick_first_new, trace_record_callops, work_serializer_clears_time_cache
sagemaker.config INFO - Not applying SDK defaults from location: C:\ProgramData\sagemaker\sagemaker\config.yaml
sagemaker.config INFO - Not applying SDK defaults from location: C:\Users\vkasturi\AppData\Local\sagemaker\sagemaker\config.yaml
User "admin" already exists
Setting up quick upload event
Running on local URL:  http://127.0.0.1:7860

To create a public link, set `share=True` in `launch()`.
User-id: None, can see public conversations: False
User-id: 1, can see public conversations: True
use_quick_index_mode False
reader_mode default
Using reader <kotaemon.loaders.pdf_loader.PDFThumbnailReader object at 0x0000015E8A34DED0>
C:\Users\vkasturi\AppData\Local\Programs\Python\Python311\Lib\site-packages\pypdf\_crypt_providers\_cryptography.py:32: CryptographyDeprecationWarning:

ARC4 has been moved to cryptography.hazmat.decrepit.ciphers.algorithms.ARC4 and will be removed from this module in 48.0.0.

Page numbers: 6
Got 6 page thumbnails
Adding documents to doc store
indexing step took 0.7975306510925293

Browsers

No response

OS

Windows

Additional information

No response

Jainbaba commented 2 days ago

This is actually a issue from the llama_index SimpleDirectoryReader Class. You refer this issue and the comments made on it.

https://github.com/run-llama/llama_index/issues/10683 Make Sure llama_index and llama_index_core both are updated with the latest version.