CosmosShadow / gptpdf

Using GPT to parse PDF
MIT License
2.89k stars 219 forks source link

Incorrect Request Body Construction for Xinference Deployment of GLM4v #15

Closed zhanghx0905 closed 2 months ago

zhanghx0905 commented 3 months ago

When sending requests to a locally deployed GLM4v model using Xinference, an error is encountered:

LLM (Large Language Model) error, Please check your key or base_url, or network.

Upon investigation, it was identified that the request body construction for openai messages is incorrect. The image_url.url field should include a format hint "data:image/png;base64," to properly encode the image data.

Steps to Reproduce

  1. Deploy GLM4v using Xinference locally.
  2. Run the example
    
    from gptpdf import parse_pdf

pdf_path = "./attention_is_all_you_need.pdf" output_dir = "./attention_is_all_you_need/"

Use OPENAI_API_KEY and OPENAI_API_BASE from environment variables

content, image_paths = parse_pdf( pdf_path, output_dir=output_dir, model="glm-4v", verbose=True, api_key="KEY", base_url="URL", ) print(content) print(image_paths)

zhanghx0905 commented 3 months ago

https://github.com/CosmosShadow/GeneralAgent/blob/3fbe8fa0118f53d3115eddc08b13bb780240a542/GeneralAgent/skills/llm_inference.py#L118

That's the root of the problem. One simple solution is to change the name of the locally deployed glm4-v model.

https://github.com/CosmosShadow/GeneralAgent/pull/7

zRzRzRzRzRzRzR commented 3 months ago

zhipuAI API不支持使用 data:image/png;base64, 只能传入图像内容,所以我提了这个PR,如果你使用的是本地模型,解决方案是按照这个issue的的方案改模型名不识别,这或许可行

Grewizard11 commented 3 months ago

Has anyone successfully tested GLM-4v?

zRzRzRzRzRzRzR commented 3 months ago

GLM-4V的API是我测试的,有遇到什么错误吗。现在可能要先 pip install GeneralAgent==0.3.19

Grewizard11 commented 2 months ago

GLM-4V的API是我测试的,有遇到什么错误吗。现在可能要先 pip install GeneralAgent==0.3.19

原来的问题和楼主一样,升级0.3.19之后问题解决了,非常感谢!