Open HuyLe82US opened 1 month ago
I found out the cause and here is the solution from ChatGPT:
Install Poppler:
C:\poppler
).brew install poppler
sudo apt-get install poppler-utils
Add Poppler to System PATH:
If you're on Windows, you'll need to add the bin
folder from the Poppler installation to your system's PATH.
Path
variable, and click Edit.bin
directory (e.g., C:\poppler\bin
).Verify Poppler Installation:
After installing Poppler and adding it to the PATH
, verify that it’s correctly set up by running the following command in your terminal (command prompt or shell):
pdftoppm -h
This should display help information for pdftoppm
, one of the tools included with Poppler. If you see this, Poppler is correctly installed and added to the PATH
.
Retry Running Your Script:
After ensuring Poppler is installed and available in the PATH
, retry running your Python script. The error related to Poppler should be resolved.
If you still encounter issues, make sure:
file_path
provided to your script is correct.After fix that, I have another issue with encoding:
Traceback (most recent call last):
File "C:\Users\PycharmProjects\pythonProject\testZerox.py", line 22, in <module>
asyncio.run(main())
File "C:\Users\AppData\Local\Programs\Python\Python312\Lib\asyncio\runners.py", line 194, in run
return runner.run(main)
^^^^^^^^^^^^^^^^
File "C:\Users\AppData\Local\Programs\Python\Python312\Lib\asyncio\runners.py", line 118, in run
return self._loop.run_until_complete(task)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "C:\Users\AppData\Local\Programs\Python\Python312\Lib\asyncio\base_events.py", line 687, in run_until_complete
return future.result()
^^^^^^^^^^^^^^^
File "C:\Users\PycharmProjects\pythonProject\testZerox.py", line 15, in main
result = await zerox(file_path=file_path, model=model, output_dir=output_dir,
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "C:\Users\PycharmProjects\pythonProject\venv\Lib\site-packages\pyzerox\core\zerox.py", line 169, in zerox
await f.write("\n\n".join(aggregated_markdown))
File "C:\Users\PycharmProjects\pythonProject\venv\Lib\site-packages\aiofiles\threadpool\utils.py", line 43, in method
return await self._loop.run_in_executor(self._executor, cb)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "C:\Users\AppData\Local\Programs\Python\Python312\Lib\concurrent\futures\thread.py", line 58, in run
result = self.fn(*self.args, **self.kwargs)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "C:\Users\AppData\Local\Programs\Python\Python312\Lib\encodings\cp1252.py", line 19, in encode
return codecs.charmap_encode(input,self.errors,encoding_table)[0]
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
UnicodeEncodeError: 'charmap' codec can't encode character '\u1edc' in position 1: character maps to <undefined>
I've update the PYTHONIOENCODING=utf-8 already in System Variables.
@HuyLe82US, please don't follow the INSTRUCTIONS ON THE ABOVE LINK SHARED BY ummm288
@tylermaran, @annapo23 please block the previous comment, the link contains a malware.
Also report the user.
@HuyLe82US, did this issue get resolved? If not, could you try setting the errors='ignore'
parameter when reading the PDF? This will skip any special characters that can't be encoded.
When I tried to OCR a .pdf file, I have this error. Here is the log:
I have installed poppler-utils already, and also checked that the package has already in the project.