JiaquanYe / TableMASTER-mmocr

2nd solution of ICDAR 2021 Competition on Scientific Literature Parsing, Task B.
Apache License 2.0
442 stars 104 forks source link

代码是先经过两个tranformer 再各经过一个decoder,为啥流程图画的先经过一个,然后各经过两个decoder #65

Open cqray1990 opened 1 year ago

cqray1990 commented 1 year ago
def decode(self, input, feature, src_mask, tgt_mask):
    # main process of transformer decoder.
    x = self.embedding(input)
    x = self.positional_encoding(x)

    # origin transformer layers
    for i, layer in enumerate(self.layers):
        x = layer(x, feature, src_mask, tgt_mask)

    # cls head
    for layer in self.cls_layer:
        cls_x = layer(x, feature, src_mask, tgt_mask)
    cls_x = self.norm(cls_x)

    # bbox head
    for layer in self.bbox_layer:
        bbox_x = layer(x, feature, src_mask, tgt_mask)
    bbox_x = self.norm(bbox_x)

    return self.cls_fc(cls_x), self.bbox_fc(bbox_x)
delveintodetail commented 1 year ago

这个项目本身是离开公司后的一个重新实现,不是我们最初的模型,但是基本结果是一致的,实验结果也是和我们最早的结果差不多的。我们参加比赛做论文的时候是使用FastOCR,后面为了开源重新基于mmocr实现了一套TableMaster算法。论文和代码的方案都可以。