Reported GBK encoding error, but I've already used the command chcp 65001 to convert to UTF-8, but the error still exist.
C:\Users\Ye>semantra C:\Users\Ye\Documents\jqac033.pdf
jqac033.pdf: 0%| | 0/1 [00:00<?, ?it/s]
Traceback (most recent call last):
File "<frozen runpy>", line 198, in _run_module_as_main
File "<frozen runpy>", line 88, in _run_code
File "C:\Users\Ye\.local\bin\semantra.exe\__main__.py", line 7, in <module>
File "C:\Users\Ye\.local\pipx\venvs\semantra\Lib\site-packages\click\core.py", line 1130, in __call__
return self.main(*args, **kwargs)
^^^^^^^^^^^^^^^^^^^^^^^^^^
File "C:\Users\Ye\.local\pipx\venvs\semantra\Lib\site-packages\click\core.py", line 1055, in main
rv = self.invoke(ctx)
^^^^^^^^^^^^^^^^
File "C:\Users\Ye\.local\pipx\venvs\semantra\Lib\site-packages\click\core.py", line 1404, in invoke
return ctx.invoke(self.callback, **ctx.params)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "C:\Users\Ye\.local\pipx\venvs\semantra\Lib\site-packages\click\core.py", line 760, in invoke
return __callback(*args, **kwargs)
^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "C:\Users\Ye\.local\pipx\venvs\semantra\Lib\site-packages\semantra\semantra.py", line 594, in main
documents[fn] = process(
^^^^^^^^
File "C:\Users\Ye\.local\pipx\venvs\semantra\Lib\site-packages\semantra\semantra.py", line 146, in process
content = get_text_content(md5, filename, semantra_dir, force, silent)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "C:\Users\Ye\.local\pipx\venvs\semantra\Lib\site-packages\semantra\semantra.py", line 45, in get_text_content
return get_pdf_content(md5, filename, semantra_dir, force, silent)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "C:\Users\Ye\.local\pipx\venvs\semantra\Lib\site-packages\semantra\pdf.py", line 79, in get_pdf_content
position += f.write(pagetext)
^^^^^^^^^^^^^^^^^
UnicodeEncodeError: 'gbk' codec can't encode character '\ufffe' in position 1353: illegal multibyte sequence
Reported GBK encoding error, but I've already used the command
chcp 65001
to convert to UTF-8, but the error still exist.