CosmosShadow / gptpdf

Using GPT to parse PDF
MIT License
2.76k stars 212 forks source link

the MD file is empty... #2

Closed michael7908 closed 2 months ago

michael7908 commented 2 months ago

after I run the test.py with all proper settings, I came across the error issue below and I can't fixt it even I have already add UTF-8 related code section into the test.py file.

Traceback (most recent call last): File "C:\Users\zhang\GPT-Projects\FreeSideProjects\gptpdf\test\test.py", line 38, in test_use_api_key() File "C:\Users\zhang\GPT-Projects\FreeSideProjects\gptpdf\test\test.py", line 22, in test_use_api_key content, image_paths = parse_pdf(pdf_path, output_dir=output_dir, api_key=api_key, base_url=base_url, model='gpt-4o', gpt_worker=6) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "C:\Users\zhang\GPT-Projects\FreeSideProjects\gptpdf\venv\Lib\site-packages\gptpdf\parse.py", line 294, in parse_pdf content = _gpt_parse_images(image_infos, output_dir=output_dir, api_key=api_key, base_url=base_url, model=model, verbose=verbose, gpt_worker=gpt_worker) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "C:\Users\zhang\GPT-Projects\FreeSideProjects\gptpdf\venv\Lib\site-packages\gptpdf\parse.py", line 261, in _gpt_parse_images f.write('\n\n'.join(contents)) UnicodeEncodeError: 'gbk' codec can't encode character '\u2020' in position 346: illegal multibyte sequence

Tendo33 commented 2 months ago

在 保存的时候加上encoding='utf-8'

CosmosShadow commented 2 months ago

v0.0.5 fix this. update by pip install gptpdf==0.05