xc-chengdu commented 2 years ago

Hello, can you answer the mechanism of action of the gradient collection function? Although the gradient gathering function is defined in the forward propagation function, it does not seem to call this function. Even if self.pos_neg.detach() is used, what is the input parameter in the collect_grad() function? Does it really work?

EthanChen1234 commented 2 years ago

hi, have you tested the Benchmark Result of YOLOX* Series?

20

waveboo commented 2 years ago

Hi @xc-chengdu , we collect the gradients in a backward-hook way. The code could be found here: code. The pos_neg, pos_grad, and neg_grad are all buffers which are used to save the collected gradients. Please feel free to ask any questions about this, thanks.

xc-chengdu commented 2 years ago

@waveboo Thank you very much for your reply. My intention is to use your improved Equalized focal loss, but I find it impossible to determine the input passed by the gradient collection function. For example, which input parameter is represented by grad_out[0]? Is it possible to call the gradient collection function in the forward in the same way as eqlv2 implementation? When I use EFL directly, it improves the accuracy of the tail category a bit, but suppresses too much of the head accuracy, which makes it not as good as focal loss. I found by debugging that the gradient collection function is not working, so how can I solve it? I am looking forward to your reply again!

xc-chengdu commented 2 years ago

hi, have you tested the Benchmark Result of YOLOX* Series? #20 I just wanted to apply their improved loss function to my own task, so I did not reproduce their results.

waveboo commented 2 years ago

@xc-chengdu ,

About the usage of the gradient collection hook, you could refer to this link. Actually, we register a backward hook on the last layer of the classification subnet (before cls loss_fn). Thus the gradient of the module output(grad_out[0]) is just the gradient from the loss function. It has no difference from the manual calculation of the gradient during the forward stage.
An alternative is to directly register a full backward hook link on the loss module input (grad_in[0]). It is a new feature supported by the nearest version of torch (about 1.8.+)
What's more, you could also calculate the gradient of each category manually like eqlv2. But one thing that needs to be noticed is that the derivative of the focal loss is different with the sigmoid loss. You should not directly use the eqlv2's collect function, but need to implement your own focal loss one.

xc-chengdu commented 2 years ago

梯度张量拼接

        grad = torch.cat(self.grad_buffer[::-1], dim=1).reshape(-1, self.num_classes)

@waveboo每个fpn层输出的维度是不一致的，其shape分别为[32768,37],[8192,37],[2048,37],[512,37],[128,37]；执行上面这句代码会因为维度不一致而无法合并；那么你们的工作是如何保证每一个特征层输出的维度一致呢？以下是我按照eqlv2的方式实现的手动搜集梯度：

waveboo commented 2 years ago

@xc-chengdu , We collect the gradients of each subnet with the grad_buffer. And when the buffer get five subnets gradients, we concat them. When we save the gradient into buffer, we reshape the gradient to (batchsize, -1, num_classes). Thus we could concat them in the dim-1. Your tensor shape is 2-dims, which could not concat at the dim-1.

And one thing you should notice is that you should not directly use our collect function, but need to implement your own focal loss one. Because our gradient collect function is designed for the auto gradient collection hook. If you want to manually collect the gradient, you need to calculate the derivative of the focal loss and implement the collect function by yourself.

waveboo commented 2 years ago

@xc-chengdu Meanwhile, we highly recommend you use the gradient collection hook because it is simple and easy, and less error-prone.

ModelTC / United-Perception

Questions about EFL code implementation #19

20

梯度张量拼接