According to arxiv.org/pdf/2105.01848.pdf, the shared decoder has only one layer and then the bbox and structure decoders have "n" layers. It specifically mentions (as shown below) that they split the decoders after the first layer itself rather than the last layer as in the code here.
According to arxiv.org/pdf/2105.01848.pdf, the shared decoder has only one layer and then the bbox and structure decoders have "n" layers. It specifically mentions (as shown below) that they split the decoders after the first layer itself rather than the last layer as in the code here.
Correct me if I'm looking at the wrong code file.