💡 [REQUEST] - MiniCPM-V支持OCR 识别返回文字坐标

OpenBMB / MiniCPM-V

MiniCPM-V 2.6: A GPT-4V Level MLLM for Single Image, Multi Image and Video on Your Phone

Apache License 2.0

12.75k stars 893 forks source link

Closed FreemanFeng closed 1 month ago

FreemanFeng commented 2 months ago

No response

No response

目前只能通过 OCR 识别到文字，但并不能准确返回文字坐标

识别图中文字并返回相应坐标，用以下json 格式返回：{"text":<识别到的文字>, "box": <[xmin, ymin, width, height]>}

不确定

No response

LDLINGLINGLING commented 2 months ago