2.6版本中ser模型max_seq_len的长度512，大于该值后所有预测值都为0

aaferrero commented 1 year ago

    ./2.6/ppocr/postprocess)/vqa_token_ser_layoutlm_postprocess.py中：

def _infer(self, preds, segment_offset_ids, ocr_infos): results = []

    for pred, segment_offset_id, ocr_info in zip(preds, segment_offset_ids,
                                                 ocr_infos):
        pred = np.argmax(pred, axis=1)
        pred = [self.id2label_map[idx] for idx in pred]

        for idx in range(len(segment_offset_id)):
            if idx == 0:
                start_id = 0
            else:
                start_id = segment_offset_id[idx - 1]

            end_id = segment_offset_id[idx]

           curr_pred = pred[start_id:end_id]
            curr_pred = [self.label2id_map_for_draw[p] for p in curr_pred]

            if len(curr_pred) <= 0:
                pred_id = 0
            else:
                counts = np.bincount(curr_pred)
                pred_id = np.argmax(counts)

在2.6版本layoutxlm中，所有推理，预测中，预测的token大于512后都会被截断，pred的长度是512，而segment_offset_id是整个文本切割后长度，所有大于512长度后，pred_id的类别永远都是预测为0。这个感觉是个bug。

Originally posted by @aaferrero in https://github.com/PaddlePaddle/PaddleOCR/issues/7974#issuecomment-1283504791

WenmuZhou commented 1 year ago

目前是限制了token长度为512，你可以改成1024看看

aaferrero commented 1 year ago

1024要报错的，因为现在是必须加载预训练模型训练，所以token长度只能写成512，2.4版本中可以支持不同长度token的预测，2.6版本这块应该是bug，我在2.6版本照着2.4版本改了下，可以支持多长度token的预测。

WenmuZhou commented 1 year ago

改的代码，方便提个pr看看吗

aaferrero commented 1 year ago

https://github.com/PaddlePaddle/PaddleOCR/tree/release/2.6/ppocr/data/imaug)/[vqa] 大概思路就是在这里面吧数据封装改下，比如token长度是750，把它padding成512+512的长度，就是1024，然后shape变成2，512，变成两个batch放进网络中预测

jackieZhouQQ commented 11 months ago

@aaferrero 改的代码能贴出来吗，也遇到了同样的问题😂

piarosebelledelapaz commented 1 month ago

@jackieZhouQQ @aaferrero were you able to solve the issue? if so, can you please provide guidance on how to do it?

PaddlePaddle / PaddleOCR

2.6版本中ser模型max_seq_len的长度512，大于该值后所有预测值都为0 #8009