None of the encoding strategies work. Windows Users are not welcome.
I have tried all kinds of utf=8 encoding. Adding to .py files and setting to default environment.
Please consider many users are using win1x.
Traditional RAG works good, but GRAPHRAG isn't working right. No outputs are allowed to generate after indexing. Sad
FOR ALL PDFS i UPLOAD
Indexing [1/1]: semRegularized.pdf
=> Converting semRegularized.pdf to text
=> Converted semRegularized.pdf to text
=> [semRegularized.pdf] Processed 44 chunks
=> Finished indexing semRegularized.pdf
Error: 'gbk' codec can't encode character '\xa9' in position 127: illegal multibyte sequence
FOR ALL TEXTs I upload
D:\anaconda3\envs\kotaemon\lib\site-packages\numpy\core\fromnumeric.py:59: FutureWarning: 'DataFrame.swapaxes' is deprecated and will be removed in a future version. Please use 'DataFrame.transpose' instead.
return bound(*args, **kwds)
Traceback (most recent call last):
File "D:\anaconda3\envs\kotaemon\lib\runpy.py", line 196, in _run_module_as_main
return _run_code(code, main_globals, None,
File "D:\anaconda3\envs\kotaemon\lib\runpy.py", line 86, in _run_code
exec(code, run_globals)
File "D:\anaconda3\envs\kotaemon\lib\site-packages\graphrag\index__main.py", line 104, in
index_cli(
File "D:\anaconda3\envs\kotaemon\lib\site-packages\graphrag\index\cli.py", line 178, in index_cli
progress_reporter.stop()
File "D:\anaconda3\envs\kotaemon\lib\site-packages\graphrag\index\progress\rich.py", line 119, in stop
self._live.stop()
File "D:\anaconda3\envs\kotaemon\lib\site-packages\rich\live.py", line 147, in stop
with self.console:
File "D:\anaconda3\envs\kotaemon\lib\site-packages\rich\console.py", line 864, in exit__
self._exit_buffer()
File "D:\anaconda3\envs\kotaemon\lib\site-packages\rich\console.py", line 822, in _exit_buffer
self._check_buffer()
File "D:\anaconda3\envs\kotaemon\lib\site-packages\rich\console.py", line 2024, in _check_buffer
self._write_buffer()
File "D:\anaconda3\envs\kotaemon\lib\site-packages\rich\console.py", line 2060, in _write_buffer
legacy_windows_render(buffer, LegacyWindowsTerm(self.file))
File "D:\anaconda3\envs\kotaemon\lib\site-packages\rich_windows_renderer.py", line 19, in legacy_windows_render
term.write_text(text)
File "D:\anaconda3\envs\kotaemon\lib\site-packages\rich_win32_console.py", line 403, in write_text
self.write(text)
UnicodeEncodeError: 'gbk' codec can't encode character '\u280b' in position 0: illegal multibyte sequence
Reproduction steps
1. Go to '...'
2. Click on '....'
3. Scroll down to '....'
4. See error
Screenshots
![DESCRIPTION](LINK.png)
Logs
D:\anaconda3\envs\kotaemon\lib\site-packages\numpy\core\fromnumeric.py:59: FutureWarning: 'DataFrame.swapaxes' is deprecated and will be removed in a future version. Please use 'DataFrame.transpose' instead.
return bound(*args, **kwds)
Traceback (most recent call last):
File "D:\anaconda3\envs\kotaemon\lib\runpy.py", line 196, in _run_module_as_main
return _run_code(code, main_globals, None,
File "D:\anaconda3\envs\kotaemon\lib\runpy.py", line 86, in _run_code
exec(code, run_globals)
File "D:\anaconda3\envs\kotaemon\lib\site-packages\graphrag\index\__main__.py", line 104, in <module>
index_cli(
File "D:\anaconda3\envs\kotaemon\lib\site-packages\graphrag\index\cli.py", line 178, in index_cli
progress_reporter.stop()
File "D:\anaconda3\envs\kotaemon\lib\site-packages\graphrag\index\progress\rich.py", line 119, in stop
self._live.stop()
File "D:\anaconda3\envs\kotaemon\lib\site-packages\rich\live.py", line 147, in stop
with self.console:
File "D:\anaconda3\envs\kotaemon\lib\site-packages\rich\console.py", line 864, in __exit__
self._exit_buffer()
File "D:\anaconda3\envs\kotaemon\lib\site-packages\rich\console.py", line 822, in _exit_buffer
self._check_buffer()
File "D:\anaconda3\envs\kotaemon\lib\site-packages\rich\console.py", line 2024, in _check_buffer
self._write_buffer()
File "D:\anaconda3\envs\kotaemon\lib\site-packages\rich\console.py", line 2060, in _write_buffer
legacy_windows_render(buffer, LegacyWindowsTerm(self.file))
File "D:\anaconda3\envs\kotaemon\lib\site-packages\rich\_windows_renderer.py", line 19, in legacy_windows_render
term.write_text(text)
File "D:\anaconda3\envs\kotaemon\lib\site-packages\rich\_win32_console.py", line 403, in write_text
self.write(text)
UnicodeEncodeError: 'gbk' codec can't encode character '\u280b' in position 0: illegal multibyte sequence
use_quick_index_mode False
reader_mode default
Using reader <kotaemon.loaders.pdf_loader.PDFThumbnailReader object at 0x000001E2A1E84130>
Page numbers: 22
Got 22 page thumbnails
Adding documents to doc store
indexing step took 6.375478029251099
Description
None of the encoding strategies work. Windows Users are not welcome. I have tried all kinds of utf=8 encoding. Adding to .py files and setting to default environment. Please consider many users are using win1x. Traditional RAG works good, but GRAPHRAG isn't working right. No outputs are allowed to generate after indexing. Sad
FOR ALL PDFS i UPLOAD
Indexing [1/1]: semRegularized.pdf => Converting semRegularized.pdf to text => Converted semRegularized.pdf to text => [semRegularized.pdf] Processed 44 chunks => Finished indexing semRegularized.pdf Error: 'gbk' codec can't encode character '\xa9' in position 127: illegal multibyte sequence
FOR ALL TEXTs I upload
D:\anaconda3\envs\kotaemon\lib\site-packages\numpy\core\fromnumeric.py:59: FutureWarning: 'DataFrame.swapaxes' is deprecated and will be removed in a future version. Please use 'DataFrame.transpose' instead. return bound(*args, **kwds) Traceback (most recent call last): File "D:\anaconda3\envs\kotaemon\lib\runpy.py", line 196, in _run_module_as_main return _run_code(code, main_globals, None, File "D:\anaconda3\envs\kotaemon\lib\runpy.py", line 86, in _run_code exec(code, run_globals) File "D:\anaconda3\envs\kotaemon\lib\site-packages\graphrag\index__main.py", line 104, in
index_cli(
File "D:\anaconda3\envs\kotaemon\lib\site-packages\graphrag\index\cli.py", line 178, in index_cli
progress_reporter.stop()
File "D:\anaconda3\envs\kotaemon\lib\site-packages\graphrag\index\progress\rich.py", line 119, in stop
self._live.stop()
File "D:\anaconda3\envs\kotaemon\lib\site-packages\rich\live.py", line 147, in stop
with self.console:
File "D:\anaconda3\envs\kotaemon\lib\site-packages\rich\console.py", line 864, in exit__
self._exit_buffer()
File "D:\anaconda3\envs\kotaemon\lib\site-packages\rich\console.py", line 822, in _exit_buffer
self._check_buffer()
File "D:\anaconda3\envs\kotaemon\lib\site-packages\rich\console.py", line 2024, in _check_buffer
self._write_buffer()
File "D:\anaconda3\envs\kotaemon\lib\site-packages\rich\console.py", line 2060, in _write_buffer
legacy_windows_render(buffer, LegacyWindowsTerm(self.file))
File "D:\anaconda3\envs\kotaemon\lib\site-packages\rich_windows_renderer.py", line 19, in legacy_windows_render
term.write_text(text)
File "D:\anaconda3\envs\kotaemon\lib\site-packages\rich_win32_console.py", line 403, in write_text
self.write(text)
UnicodeEncodeError: 'gbk' codec can't encode character '\u280b' in position 0: illegal multibyte sequence
Reproduction steps
Screenshots
Logs
Browsers
Chrome
OS
Windows
Additional information
No response