Should we use 2 points or point + width/height in bbox for model training?

I check the paddle implementations here

def _cal_spatial_position_embeddings(self, bbox):
        try:
            left_position_embeddings = self.x_position_embeddings(bbox[:, :, 0])
            upper_position_embeddings = self.y_position_embeddings(bbox[:, :, 1])
            right_position_embeddings = self.x_position_embeddings(bbox[:, :, 2])
            lower_position_embeddings = self.y_position_embeddings(bbox[:, :, 3])
        except IndexError as e:
            raise IndexError("The :obj:`bbox`coordinate values should be within 0-1000 range.") from e

        h_position_embeddings = self.h_position_embeddings(bbox[:, :, 3] - bbox[:, :, 1])
        w_position_embeddings = self.w_position_embeddings(bbox[:, :, 2] - bbox[:, :, 0])
        return (
            left_position_embeddings,
            upper_position_embeddings,
            right_position_embeddings,
            lower_position_embeddings,
            h_position_embeddings,
            w_position_embeddings,
        )

As you can see, bbox[:, :, 3] - bbox[:, :, 1] calculates the height and bbox[:, :, 2] - bbox[:, :, 0] calculates the width. The code takes care of the box_width and box_height itself, so we should the order as the LayoutLMv3 .

NormXU / ERNIE-Layout-Pytorch

Should we use 2 points or point + width/height in bbox for model training? #15