Closed CoinCheung closed 3 years ago
I'm not doubting your code. I am trying to learn and write a multi-task model that does both semantic segmentation and object detection. Segmentation Head references others' ICNet, BiSeNet, and other algorithms. In my experiments, the test set will be roughly 1.5% lower than the validation set, which is normal, but there are algorithms that perform essentially the same in the validation and test sets.I think it's less about the model itself and more about the training tricks, so I came to ask if you have submitted on the test set.
In fact I read and compared https://github.com/ycszen/TorchSeg/tree/master/model/bisenet and yours, the network structure part is basically the same, the difference is that your segmentation Head uses 3×3 convolution, 1×1 convolution classification, then PixelShuffle, he first uses 3×3 convolution to 64(main), 128(aux) channels, then 1×1 convolution classification, then bilinear interpolation. Also, I compared his OHEM implementation and I prefer yours because this implementation is more concise and 20s faster per epoch than his on my computer.
Also I have two questions.
PixelShuffle looks like a good idea. But why did you decide to use PixelShuffle, and in your experiments did you compare the difference in speed and accuracy between bilinear interpolation and PixelShuffle?
The auxiliary loss and the main loss weights of BiSeNet seem to be the same. How do you see this problem? I think the main pathway is what we need in the end in inference, and reducing the proportion of auxiliary losses seems to be a better approach.
Hi @TomMao23,
I used F.interpolation originally, which also works, but I found problems when I exported the model to onnx and compile with tensorrt-7.0.0, so I used nn.PixelShuffle to replace it. Actually, I did that almost one year ago. Now that tensorrt 7.2 has come out, we can use back interpolation. From my observation, to use tensorrt 7.2, one should have cuda10.2+cudnn7, and the gpu driver version should be above 440, while tensorrt 7.0 only requires gpu driver version to be 418. In order to be compatible with older gpu driver, I continued to use tensorrt 7.0 and stick to nn.PixelShuffle.
As for the weight of auxiliary loss, I found that using same weight makes result acceptable thus I did not tune them carefully. Maybe other values can make result better.
Thanks for asking !! Hope it can be helpful to you.
I forgot the mention that, before pytorch 1.8.1, there seems to be a problem with the backward computation of F.interpolate
when it is under fp16 mode and bilinear mode concurrently. The training would be very slow. This has been fixed after pytorch 1.8.1. Since this project depends on 1.6, I simply used pixelshuffle instead.
I am the owner of this repo.
I just wonder why people say the code is not correct without pointing out where the errors are. The code is short and easy to understand, why not directly tell me what the problems are ?
How should I prove it ?