请问PP-OCRv3和PP-OCRv4检测模型是用了哪些数据集训练的？

PaddlePaddle / PaddleOCR

Awesome multilingual OCR toolkits based on PaddlePaddle (practical ultra lightweight OCR system, support 80+ languages recognition, provide data annotation and synthesis tools, support training and deployment among server, mobile, embedded and IoT devices)

Apache License 2.0

38.99k stars 7.32k forks source link

请问PP-OCRv3和PP-OCRv4检测模型是用了哪些数据集训练的？ #11992

Closed JIANG3330 closed 2 weeks ago

JIANG3330 commented 3 weeks ago

请问PP-OCRv3和PP-OCRv4检测模型是用了哪些数据集训练的？hmean指标是在哪个数据集上评估的？我在介绍文档中找不到相关的信息：https://github.com/PaddlePaddle/PaddleOCR/blob/main/doc/doc_ch/PP-OCRv4_introduction.md

Sunting78 commented 2 weeks ago

您好，数据部分我们使用了公开数据以及收集的数据，进行了多次数据筛选。评测是主要由中文数据组成的场景比较全面的数据集。暂未公开。