PaddlePaddle / PaddleOCR

Awesome multilingual OCR toolkits based on PaddlePaddle (practical ultra lightweight OCR system, support 80+ languages recognition, provide data annotation and synthesis tools, support training and deployment among server, mobile, embedded and IoT devices)
https://paddlepaddle.github.io/PaddleOCR/
Apache License 2.0
42.69k stars 7.68k forks source link

2.6版本中ser模型max_seq_len的长度512,大于该值后所有预测值都为0 #8009

Closed aaferrero closed 1 year ago

aaferrero commented 1 year ago
    ./2.6/ppocr/postprocess)/vqa_token_ser_layoutlm_postprocess.py中:

def _infer(self, preds, segment_offset_ids, ocr_infos): results = []

    for pred, segment_offset_id, ocr_info in zip(preds, segment_offset_ids,
                                                 ocr_infos):
        pred = np.argmax(pred, axis=1)
        pred = [self.id2label_map[idx] for idx in pred]

        for idx in range(len(segment_offset_id)):
            if idx == 0:
                start_id = 0
            else:
                start_id = segment_offset_id[idx - 1]

            end_id = segment_offset_id[idx]

           curr_pred = pred[start_id:end_id]
            curr_pred = [self.label2id_map_for_draw[p] for p in curr_pred]

            if len(curr_pred) <= 0:
                pred_id = 0
            else:
                counts = np.bincount(curr_pred)
                pred_id = np.argmax(counts)

在2.6版本layoutxlm中,所有推理,预测中,预测的token大于512后都会被截断,pred的长度是512,而segment_offset_id是整个文本切割后长度,所有大于512长度后,pred_id的类别永远都是预测为0。这个感觉是个bug。

Originally posted by @aaferrero in https://github.com/PaddlePaddle/PaddleOCR/issues/7974#issuecomment-1283504791

WenmuZhou commented 1 year ago

目前是限制了token长度为512,你可以改成1024看看

aaferrero commented 1 year ago

1024要报错的,因为现在是必须加载预训练模型训练,所以token长度只能写成512,2.4版本中可以支持不同长度token的预测,2.6版本这块应该是bug,我在2.6版本照着2.4版本改了下,可以支持多长度token的预测。

WenmuZhou commented 1 year ago

改的代码,方便提个pr看看吗

aaferrero commented 1 year ago

https://github.com/PaddlePaddle/PaddleOCR/tree/release/2.6/ppocr/data/imaug)/[vqa] 大概思路就是在这里面吧数据封装改下,比如token长度是750,把它padding成512+512的长度,就是1024,然后shape变成2,512,变成两个batch放进网络中预测

jackieZhouQQ commented 11 months ago

@aaferrero 改的代码能贴出来吗,也遇到了同样的问题😂

piarosebelledelapaz commented 1 month ago

@jackieZhouQQ @aaferrero were you able to solve the issue? if so, can you please provide guidance on how to do it?