Detect Head Output - Githubissues

ambitious-octopus commented 1 month ago

The output of the forward Method of the Detection Head needs to be a torch.Tensor, instead of a tuple. This would facilitate integration with our original YOLOv8 model. Would it be possible to modify the mct pipeline to accept a torch.Tensor with shape (B, 84, 8400) instead of a tuple within split y_bb with shape (B, 8400, 4) and y_cls with shape (B, 840, 80)?

Sony Implementation:

    def forward(self, x: Tensor) -> Tuple[Tensor, Tensor]:
        shape = x[0].shape  # BCHW
        for i in range(self.nl):
            x[i] = torch.cat((self.cv2[i](x[i]), self.cv3[i](x[i])), 1)
        box, cls = torch.cat([xi.view(shape[0], self.no, -1) for xi in x], 2).split(
            (self.reg_max * 4, self.nc), 1)

        y_cls = cls.sigmoid().transpose(1, 2)

        dfl = self.dfl(box)
        dfl = dfl * self.strides

        # box decoding
        lt, rb = dfl.chunk(2, 1)
        y1 = self.relu1(self.anchors.unsqueeze(0)[:, 0, :] - lt[:, 0, :])
        x1 = self.relu2(self.anchors.unsqueeze(0)[:, 1, :] - lt[:, 1, :])
        y2 = self.relu3(self.anchors.unsqueeze(0)[:, 0, :] + rb[:, 0, :])
        x2 = self.relu4(self.anchors.unsqueeze(0)[:, 1, :] + rb[:, 1, :])
        y_bb = torch.stack((x1, y1, x2, y2), 1).transpose(1, 2)
        return y_bb, y_cls

Original YOLOv8 implementation:

    def forward(self, x):
        """Concatenates and returns predicted bounding boxes and class probabilities."""
        if self.end2end:
            return self.forward_end2end(x)

        for i in range(self.nl):
            x[i] = torch.cat((self.cv2[i](x[i]), self.cv3[i](x[i])), 1)
        if self.training:  # Training path
            return x
        y = self._inference(x)
        return y if self.export else (y, x)

Idan-BenAmi commented 1 month ago

Hi @ambitious-octopus,

The reason we have this modification (tuple instead single Tensor), is due to the different values range of those 2 parts of the original Tensor. y_bb (bounding boxes coordinates 4x8400) values range in [0, 640] and y_cls (score per class - 80x8400) range in [0,1].

This fact makes the quantization of this single tensor very problematic, resulting in bad accuracy of the quantized model. (for one part we need high resolution in the range of [0,1] while for the other part we need to stretch the values up to 640). The solution we suggested is therefore to keep the 2 parts separately during MCT quantization.

Let me know if you need more detailed explanation. Thanks Idan

Laughing-q commented 1 month ago

@Idan-BenAmi Hi! Thanks for the explanation! However modifying the original tensor output to a tuple in ultralytics package would break all our current inference pipelines. Is it possible to split the original tensor to a tuple in model_optimization repo before actual starting the MCT quantization? Thanks

Idan-BenAmi commented 3 days ago

Hi @Laughing-q and @ambitious-octopus , I missed your last question, sorry for the delay. The MCT isn't designed to manage this kind of manipulation. What do you think about keeping the split operation within the export code?

ambitious-octopus / model_optimization

Detect Head Output #1