not understanding the SFE block and the blending block

taiqzheng commented 2 years ago

Hi, I'm a begginer in DeepLearing and have got 2 questions about the thesis. The first one is about the SFE block, it was stated that the "stage predictions" are supervised against ground truth in subtitle '3.5. Stage-wise Feature Extraction (SFE) Module'. I looked through the code of MapAdapter(SFE block) and found all the scale parameters, 1x1 and 3x3 convolution layers, so how did you achieve to supervise against ground truth with these components? The below figure is the code for SFE module.

taiqzheng commented 2 years ago

The next question is about the RCSB block, the input of it is actually the same.

In the figure above, like CSFB2 have _[M3_2, M32] as input x, and inside the CSFBBlock class(RCSB block) have the ctr, sal = x.

Does this mean the Ctr and Sal information are seperated and trained by the 2 branches in the CSFBUnit class(CSBU block)?
Why not merge both _C3map and _S3map with A2, and got 2 different input for the RCSB block? Or does you mean that _M3_2 already contains both C3_map and S3_map_, so the method being used is more convenient?

BarCodeReader commented 2 years ago

Hi, I'm a begginer in DeepLearing and have got 2 questions about the thesis. The first one is about the SFE block, it was stated that the "stage predictions" are supervised against ground truth in subtitle '3.5. Stage-wise Feature Extraction (SFE) Module'. I looked through the code of MapAdapter(SFE block) and found all the scale parameters, 1x1 and 3x3 convolution layers, so how did you achieve to supervise against ground truth with these components? The below figure is the code for SFE module.

Thanks for your interest in our work. I am not sure if I get your question correctly. So the basic idea of the supervision, no matter whether it is a block/stage or network, is to generate predictions and together with ground truth we calculate the loss in loss functions, and the gradient will be back-propengated to update all layers and parameters...in order to understand the process, you can refer to Solver.py

BarCodeReader commented 2 years ago

The next question is about the RCSB block, the input of it is actually the same.

In the figure above, like CSFB2 have _[M3_2, M32] as input x, and inside the CSFBBlock class(RCSB block) have the ctr, sal = x.

Does this mean the Ctr and Sal information are seperated and trained by the 2 branches in the CSFBUnit class(CSBU block)?

Why not merge both _C3map and _S3map with A2, and got 2 different input for the RCSB block? Or does you mean that _M3_2 already contains both C3_map and S3_map_, so the method being used is more convenient?

well, you can think they are two branches, but they will exchange information with each other...so that is the "blending" part.
this is actually a good question and it is a possible way. it is true that we can merge [C3_map, A2] as CA2 and [S3_map, A2] as SA2 then pass them as separate stream to RCSB. During our experiments, we found that merging them first gives a little better results (though we did not spend much time to verify how significantly is the difference). Probably because we merge [C3_map, S3_map, A2] into M3_2 will give us 1 more fusion of contour and saliency than merge them seperatly?

taiqzheng commented 2 years ago

Thanks a lot for your answer and advice. I'm figuring how to generate better Ctr information, and tried to add a FFT module to gain high frequency information(which I hope is edge information). The MAE for the model got worse and the intermediate result given by FFT module is messy and incomplete in outline. I'm now tring out with Canny method which will gain better Ctr information, and in the model the new Ctr information will be merge with _S3Mx2 and A2, etc. Do you have any suggestions for introducing other Ctr information? By the way, I also tried to replace ResNet-50 with EfficientNet-b0, the number of parameters is reduced to 9403004. The model was trained with 42 epochs and got MAE almost the same(on DUTS-TE, RCSB.pt got 0.036, this one got 0.0365). The below figure shows the result of log.txt. Maybe EfficientNet-bx（x>0）will gain better result, for the resolution is bigger and may contain more information.

BarCodeReader commented 2 years ago

Thanks a lot for your answer and advice. I'm figuring how to generate better Ctr information, and tried to add a FFT module to gain high frequency information(which I hope is edge information). The MAE for the model got worse and the intermediate result given by FFT module is messy and incomplete in outline. I'm now tring out with Canny method which will gain better Ctr information, and in the model the new Ctr information will be merge with _S3Mx2 and A2, etc. Do you have any suggestions for introducing other Ctr information? By the way, I also tried to replace ResNet-50 with EfficientNet-b0, the number of parameters is reduced to 9403004. The model was trained with 42 epochs and got MAE almost the same(on DUTS-TE, RCSB.pt got 0.036, this one got 0.0365). The below figure shows the result of log.txt. Maybe EfficientNet-bx（x>0）will gain better result, for the resolution is bigger and may contain more information.

Thanks for the feedback. Yes better encoder will definitely give better results, and the parameters reduces to 10M is really a good try! For edge detectors, I don't know too much, only sobel, canny and Laplacian...not sure they will help or not.

BarCodeReader / RCSB-PyTorch

not understanding the SFE block and the blending block #2