WongKinYiu / CrossStagePartialNetworks

Cross Stage Partial Networks
https://github.com/WongKinYiu/CrossStagePartialNetworks
894 stars 172 forks source link

Training Steps Mismatch in the paper and the code in ImageNet Experiments #24

Open Chaimmoon opened 4 years ago

Chaimmoon commented 4 years ago

Hi,

In ImageNet Experiments, the paper said that it should be trained for 800 epochs:

image

However, in the code, it said that it should be trained for 80 epochs:

image

So there is a big difference……

Besides, I try to re-implement in PyTorch, and the ACC is 7~8 points behind your method. The network architecture and number of parameters is the same as your Darknet results……

Best, Mu

WongKinYiu commented 4 years ago

@Chaimmoon

Thank you for point out the typos. It should be 800,000, which is same in the cfg.

I have only implemented CSPDensenet and CSPDarknet with Pytorch. Following is the results of (CSP)Densenet-{121, 169, 201, 264} with PyTorch. image and my PyTorch implemented darknet53 and cspdarknet53 get 76.3/92.9 and 76.9/93.3 top-1/top-5 accuracy with 224x224 input resolution, respectively.

You should make sure the BN layers and activation functions are same as provided cfg file.

WongKinYiu commented 4 years ago

@Chaimmoon

this is my PyTorch implementation of CSPDarknet. darknet.py.txt

I borrow some functions from mmdetection and mmcv. the main difference between CSPDarknet and CSPResNe(X)t is CSPDarknet use darknet_layer and CSPResNe(X)t use resne(x)t_layer.

            x = down_layer(x)
            x1, x2 = x.chunk(2, dim=1)
            x2 = darknet_layer(x2)
            x = torch.cat([x1,x2], 1)
            x = tran_layer(x)
Chaimmoon commented 4 years ago

@Chaimmoon

Thank you for point out the typos. It should be 800,000, which is same in the cfg.

I have only implemented CSPDensenet and CSPDarknet with Pytorch. Following is the results of (CSP)Densenet-{121, 169, 201, 264} with PyTorch. image and my PyTorch implemented darknet53 and cspdarknet53 get 76.3/92.9 and 76.9/93.3 top-1/top-5 accuracy with 224x224 input resolution, respectively.

You should make sure the BN layers and activation functions are same as provided cfg file.

@WongKinYiu

Thanks for your reply!

I implemented the ResNet10, ResNet50 and ResNeXt50. The results are not quite good as your paper said... (Besides, can you provide the cfg file for the ResNet10_CSP? The architectures for ResNet10 and 50 are quite different.)

As for the BN, it should be torch.nn.BatchNorm2d, and the activation function should be torch.nn.LeakyReLU, right?

Can you provide your PyTorch code? Thanks

Mu

Best, Mu

WongKinYiu commented 4 years ago

@Chaimmoon

My PyTorch code is posted on https://github.com/WongKinYiu/CrossStagePartialNetworks/issues/24#issuecomment-623125410.

I am sorry about that I can not release my lightweight models due to some issues. You can try to follow the rule of ResNet50->CSPResNet50 to modify ResNet10->CSPResNet10.

nyj-ocean commented 4 years ago

@WongKinYiu Thanks for your work! I have a question about [sam] layers

in https://github.com/AlexeyAB/darknet/issues/3708#issuecomment-518618199 SAMmodule consists of one [convolutional] layer and one sam layer like following 62534082-d1fc3b00-b87a-11e9-8665-adc6f719d3d8

while in https://github.com/AlexeyAB/darknet/issues/5355#issuecomment-619859913 SAMmodule consists of two [convolutional] layers and one sam layer ,not one [convolutional] layer, like following

[convolutional]
batch_normalize=1
size=3
stride=1
pad=1
filters=512
activation=mish

[convolutional]
batch_normalize=1
filters=512
size=1
stride=1
pad=1
activation=logistic

[sam]
from=-2

what's more,in https://github.com/AlexeyAB/darknet/issues/5355#issuecomment-619859913 the [convolutional] layer in front of the sam layer has pad=1,while in https://github.com/AlexeyAB/darknet/issues/3708#issuecomment-518618199, the [convolutional] layer in front of the sam layer dose not have pad=1,

I want to know which [sam] layer is correct?

WongKinYiu commented 4 years ago

@nyj-ocean Hello,

  1. In https://github.com/AlexeyAB/darknet/issues/5355#issuecomment-619859913
    
    [convolutional]
    batch_normalize=1
    size=3
    stride=1
    pad=1
    filters=512
    activation=mish

[convolutional] batch_normalize=1 filters=512 size=1 stride=1 pad=1 activation=logistic

[sam] from=-2


which is sam module.
![image](https://user-images.githubusercontent.com/12152972/81059524-23695180-8f03-11ea-9498-d17c8277739a.png)

2. In https://github.com/AlexeyAB/darknet/issues/3708#issuecomment-518618199
![](https://user-images.githubusercontent.com/55009815/81057829-c6b86780-8eff-11ea-8618-da086c5815cf.png)
which is the usage of sam layer.
![image](https://user-images.githubusercontent.com/12152972/81059834-d639af80-8f03-11ea-8cc3-3c7d7d6ca096.png)

3. `pad=1` and `pad=0` are same when convolutional filter size is `1x1`.
nyj-ocean commented 4 years ago

@WongKinYiu Thanks for your reply I want to add the SAM module to YOLOv3,. can you help me check whether the following cfg is right?

SAM-to-yolov3.cfg.txt

WongKinYiu commented 4 years ago

@nyj-ocean

the latest [sam] block seems at different layer when compare with 1st and 2nd [sam] block in your cfg file.

and in my previous experiments, i used sam layer as: SAM-to-yolov3.cfg.txt

nyj-ocean commented 4 years ago

@WongKinYiu Thanks for your help! I noticed that the yolov4 paper has mentioned a modified SAM block. Is the SAM block in your provided SAM-to-yolov3.cfg.txt https://github.com/WongKinYiu/CrossStagePartialNetworks/issues/24#issuecomment-624093575 equal to the modified SAMblock mentioned in yolov4?

WongKinYiu commented 4 years ago

yes, it is same. and the comparison of w/w\o sam is posted on 1st table of readme in this repo.

nyj-ocean commented 4 years ago

@WongKinYiu thanks for your help!!!

Chaimmoon commented 4 years ago

@WongKinYiu

Hi, I have checked the network structure and number of parameters in my CSPResNet/CSPResNeXt PyTorch implementation, which is the same as what you reported in your Github README file, including nn.BachNorm2d, nn.LeakyReLu, Training epochs, batch size and learning rate schedule. I also have a close look at your DarkNet PyTorch implementation. However, the ACC point is still below yours...

My Results:

Thanks!

WongKinYiu commented 4 years ago

@Chaimmoon

I am not sure it is important or not, I just follow https://pjreddie.com/darknet/imagenet/.

And I think gets a little bit lower accuracy is normal, since darknet use 256x256 for validation, and I guess your PyTorch code use 224x224 instead. My CSPDarknet53 PyTorch (224x224) implementation also gets 0.6% lower top-1 accuracy than Darknet (256x256) implementation.

Could you share your code of CSPResNet / CSPResNeXt, I would like to upload the implementation and results to pytorch branch if it is OK.

nyj-ocean commented 4 years ago

@WongKinYiu I'm sorry to bother you again.

I notice that the modified SAM in yolov4 paper is reference to the CBAM paper.

However, I also find that ThunderNet paper also design a SAM.

so I want to know:

  1. The SAM in CBAM paper is same as the SAM in ThunderNet paper?

  2. In yolov4 paper, the modified SAM is reference to the CBAM paper. But in https://github.com/AlexeyAB/darknet/issues/3708#issuecomment-518583264, LukeAI said the [sam] layer is for thundernet. Are the two statements in conflict? which one is correct?

WongKinYiu commented 4 years ago

@nyj-ocean

There are many kind of channel attention module (CAM) spatial attention module (SAM) in the literature. For example SENet and SKNet proposed different kind of CAM, and CBAM and ThunderNet prposed different kind of SAM. In general, we will cite the first paper or the most similar paper or both in related work. So the answer of your question is:

  1. The SAM in CBAM paper is same as the SAM in ThunderNet paper?

No, they are different.

  1. In yolov4 paper, the modified SAM is reference to the CBAM paper. But in AlexeyAB/darknet#3708 (comment), LukeAI said the [sam] layer is for thundernet. Are the two statements in conflict? which one is correct?

The CBAM is the first paper which proposed SAM, we cite it in yolov4 paper. The ThunderNet prposed the most similar SAM module as ours, we cite it in cspnet paper. SAM in CBAM: image SAM in ThunderNet: image

nyj-ocean commented 4 years ago

@WongKinYiu Thanks for your reply. yolov4 paper modify SAM from spatial-wise attention to point-wise attention, So the SAM module before modified in yolov4 (that is spatial-wise attention ) is similar to the SAM module in CBAM paper?

WongKinYiu commented 4 years ago

yes, all of different kind of sam modules produce the attention of spatial.

nyj-ocean commented 4 years ago

@WongKinYiu thanks a lot

Chaimmoon commented 4 years ago

@Chaimmoon

I am not sure it is important or not, I just follow https://pjreddie.com/darknet/imagenet/.

And I think gets a little bit lower accuracy is normal, since darknet use 256x256 for validation, and I guess your PyTorch code use 224x224 instead. My CSPDarknet53 PyTorch (224x224) implementation also gets 0.6% lower top-1 accuracy than Darknet (256x256) implementation.

Could you share your code of CSPResNet / CSPResNeXt, I would like to upload the implementation and results to pytorch branch if it is OK.

Hi @WongKinYiu

Thanks for your reply! I think that during training and testing, the DarkNet framework keeps the image size as 256256. However, for common PyTorch training, the training size is 224224, and the test size is 256*256. Is my understanding right?

WongKinYiu commented 4 years ago

@Chaimmoon

it is depend on your code. the most common testing protocol in PyTorch is single-crop (224x224). https://pytorch.org/docs/stable/torchvision/models.html and the other common testing protocols nowadays are 10-crop (224x224 5-crop flip), 5-crop(224x224 * (center+ 4 corners)), and full (256x256).

nyj-ocean commented 4 years ago

@WongKinYiu I'm sorry to bother you again. I want to produce the picture about anchors of yolov3,like following . but I don't know how to do it. Can you tell me how to produce this picture about anchors? 2020-05-13 16-52-58屏幕截图

WongKinYiu commented 4 years ago

@nyj-ocean

i do not know too, i always use the anchors which yolo9000 calculated.

AlexeyAB commented 4 years ago

You can calculate new anchors by using this command: ./darknet detector calc_anchors coco.data -num_of_clusters 9 -width 512 -height 512 -show

image

nyj-ocean commented 4 years ago

@WongKinYiu thanks for your reply

@AlexeyAB Thank you so much!! It helps me a lot! If the background color of cloud.png is white, it will be better for me. How can I change the background color of cloud.png from black to white?

AlexeyAB commented 4 years ago
nyj-ocean commented 4 years ago

@AlexeyAB great! thanks a lot

nyj-ocean commented 4 years ago

@AlexeyAB sorry to bother you again. I ues the following command to generate my cloud.pngon my own dataset. ./darknet detector calc_anchors my-own-dataset.data -num_of_clusters 9 -width 608 -height 608 -show The following figure is my cloud.png cloud

I find that there are many black spare parts in my own clond.png However, there is almost no black spare parts in cloud.png of coco dataset. The anchor almost fills the whole cloud.png of coco dataset (seen https://github.com/WongKinYiu/CrossStagePartialNetworks/issues/24#issuecomment-627941826)

WongKinYiu commented 4 years ago

i guess images in your dataset are form videos.

AlexeyAB commented 4 years ago

What is the black spare? There is no problem.

nyj-ocean commented 4 years ago

@AlexeyAB Theblack spareparts is like the following:

1

there are many black spare parts in my ownclond.png However, there is almost no black spare parts in cloud.png of coco dataset. (seen https://github.com/WongKinYiu/CrossStagePartialNetworks/issues/24#issuecomment-627941826)

nyj-ocean commented 4 years ago

@WongKinYiu The images in my dataset are not taken from videos

AlexeyAB commented 4 years ago

Why are there many black spare parts in my own cloud.png ?

Because your objects are small relative to the image size. This is normal.

Just may be you should use higher network resolution for anchors calculation, training and detection to get good results.

https://github.com/AlexeyAB/darknet#how-to-improve-object-detection

Only if you are an expert in neural detection networks - recalculate anchors for your dataset for width and height from cfg-file: darknet.exe detector calc_anchors data/obj.data -num_of_clusters 9 -width 416 -height 416 then set the same 9 anchors in each of 3 [yolo]-layers in your cfg-file. But you should change indexes of anchors masks= for each [yolo]-layer, so that 1st-[yolo]-layer has anchors larger than 60x60, 2nd larger than 30x30, 3rd remaining. Also you should change the filters=(classes + 5)* before each [yolo]-layer. If many of the calculated anchors do not fit under the appropriate layers - then just try using all the default anchors.

nyj-ocean commented 4 years ago

@AlexeyAB Thank you so much

nyj-ocean commented 3 years ago

@AlexeyAB sorry to bother you again. ./darknet detector calc_anchors coco.data -num_of_clusters 9 -width 512 -height 512 -show It will create cloud.png If it can createcloud.eps , it will be better for me. How can I change the cloud.png from png to eps?

AnhPC03 commented 3 years ago

@WongKinYiu

Hi, I have checked the network structure and number of parameters in my CSPResNet/CSPResNeXt PyTorch implementation, which is the same as what you reported in your Github README file, including nn.BachNorm2d, nn.LeakyReLu, Training epochs, batch size and learning rate schedule. I also have a close look at your DarkNet PyTorch implementation. However, the ACC point is still below yours...

My Results:

  • CSPResNet50: Prec@1 75.772 Prec@5 92.716 (Paper results: 76.6 % 93.3%)
  • CSPResNeXt50: Prec@1 76.328 Prec@5 93.058 (Paper results: 77.9 % 94.0%)

Thanks!

@Chaimmoon Could you share me your code of CSPResNet50? Thank you.

nyj-ocean commented 3 years ago

@WongKinYiu

I'm sorry to bother you again.

I have another question about SAM module

yolov4 paper modify SAM from spatial-wise attention to point-wise attention,

  1. I can not fully understand that yolov4 modify SAM from spatial-wise attention to point-wise attention. Does it mean that yolov4 modify SAM from Max-pooling and Average-Pooling to Convolution layers?

  2. What is point-wise attention ? Is the point-wise attention equal to the convolution layer ?

WongKinYiu commented 3 years ago

channel-wise: each channel has one attention 1x1xc. spatial-wise: each position has one attention wxhx1. point-wise: each feature point has one attention wxhxc.

nyj-ocean commented 3 years ago

@WongKinYiu

Thanks for your reply.

what I understand about yolov4 modify SAM from spatial-wise attention to point-wise attention is that is yolov4 use a 1*1 convolution layer replace the maxpool ,avgpool ,7*7 convolution layer ,just like the following:

图片1

  1. Is my understanding correct?

    2.If my understanding is correct, can you tell me why yolov4 modify SAM from spatial-wise attention to point-wise attention ? What are the benefits of making this modify? Is it to reduce inference time?https://github.com/AlexeyAB/darknet/issues/3708#issuecomment-528140698

These questions are very troubling to me. I look forward to your answers.Thanks a lot