OpenBMB / MiniCPM-V

MiniCPM-V 2.6: A GPT-4V Level MLLM for Single Image, Multi Image and Video on Your Phone
Apache License 2.0
12.75k stars 893 forks source link

💡 [REQUEST] - MiniCPM-V支持OCR 识别返回文字坐标 #594

Closed FreemanFeng closed 1 month ago

FreemanFeng commented 2 months ago

起始日期 | Start Date

No response

实现PR | Implementation PR

No response

相关Issues | Reference Issues

No response

摘要 | Summary

目前只能通过 OCR 识别到文字,但并不能准确返回文字坐标

基本示例 | Basic Example

识别图中文字并返回相应坐标,用以下json 格式返回:{"text":<识别到的文字>, "box": <[xmin, ymin, width, height]>}

缺陷 | Drawbacks

不确定

未解决问题 | Unresolved questions

No response

LDLINGLINGLING commented 2 months ago

https://modelbest.feishu.cn/wiki/HLRiwNgKEic6cckGyGucFvxQnJw?from=from_copylink请查看这篇教程