biubug6 / Pytorch_Retinaface

Retinaface get 80.99% in widerface hard val using mobilenet0.25.
MIT License
2.63k stars 774 forks source link

PriorBox forward is slow #131

Open gan3sh500 opened 4 years ago

gan3sh500 commented 4 years ago

The prior box forward function is not vectorised. This is easily vectorizable as below.

 def priorbox_forward(min_sizes, steps, clip, image_size):
     feature_maps = [[ceil(image_size[0] / step), ceil(image_size[1] / step)] for step in steps]
     anchors = []
     for k, f in enumerate(feature_maps):
         min_sizs = min_sizes[k]
         mat = np.array(list(product(range(f[0]), range(f[1]), min_sizs))).astype(np.float32)
         mat[:, 0] = (mat[:, 0] + 0.5) * steps[k] / image_size[1]
         mat[:, 1] = (mat[:, 1] + 0.5) * steps[k] / image_size[0]
         mat = np.concatenate([mat, mat[:, 2:3]], axis=1)
         mat[:, 2] = mat[:, 2] / image_size[1]
         mat[:, 3] = mat[:, 3] / image_size[0]
         anchors.append(mat)
     output = np.concatenate(anchors, axis=0)
     if clip: 
         output = np.clip(output, 0, 1)
     return torch.from_numpy(output)

Can I submit a PR? The vectorisation makes it 2x faster for me on a Ryzen 5 3600.

rafale77 commented 3 years ago

Old post but I think this could be useful. Not sure what is stopping you from submitting a PR. I tried your function and I strangely am not getting the exact same result...

aengoo commented 3 years ago

The prior box forward function is not vectorised. This is easily vectorizable as below.

 def priorbox_forward(min_sizes, steps, clip, image_size):
     feature_maps = [[ceil(image_size[0] / step), ceil(image_size[1] / step)] for step in steps]
     anchors = []
     for k, f in enumerate(feature_maps):
         min_sizs = min_sizes[k]
         mat = np.array(list(product(range(f[0]), range(f[1]), min_sizs))).astype(np.float32)
         mat[:, 0] = (mat[:, 0] + 0.5) * steps[k] / image_size[1]
         mat[:, 1] = (mat[:, 1] + 0.5) * steps[k] / image_size[0]
         mat = np.concatenate([mat, mat[:, 2:3]], axis=1)
         mat[:, 2] = mat[:, 2] / image_size[1]
         mat[:, 3] = mat[:, 3] / image_size[0]
         anchors.append(mat)
     output = np.concatenate(anchors, axis=0)
     if clip: 
         output = np.clip(output, 0, 1)
     return torch.from_numpy(output)

Can I submit a PR? The vectorisation makes it 2x faster for me on a Ryzen 5 3600.

When I ran it, it worked abnormally. so I used it with modifying a little. thanks

    def vectorized_forward(self):
        anchors = []
        for k, f in enumerate(self.feature_maps):
            min_size = self.min_sizes[k]
            mat = np.array(list(product(range(f[0]), range(f[1]), min_size))).astype(np.float32)
            mat[:, 0], mat[:, 1] = ((mat[:, 1] + 0.5) * self.steps[k] / self.image_size[1],
                                    (mat[:, 0] + 0.5) * self.steps[k] / self.image_size[0])
            mat = np.concatenate([mat, mat[:, 2:3]], axis=1)
            mat[:, 2] = mat[:, 2] / self.image_size[1]
            mat[:, 3] = mat[:, 3] / self.image_size[0]
            anchors.append(mat)
        output = np.concatenate(anchors, axis=0)
        if self.clip:
            output = np.clip(output, 0, 1)
        return torch.from_numpy(output)

It runs almost twice as fast for me too. It yet affected by resolution of input image a lot.