WongKinYiu / CrossStagePartialNetworks

Cross Stage Partial Networks
https://github.com/WongKinYiu/CrossStagePartialNetworks
894 stars 172 forks source link

Bad inference performance with CSPResNeXt50-PANet-SPP #7

Open ekarabulut opened 4 years ago

ekarabulut commented 4 years ago

Hi,

I've been inspecting CSPResNeXt50-PANet-SPP for human detection in real-time. According to readme file of this repository, CSPResNeXt50-PANet-SPP performs better than Yolov3 in AP on COCO dataset.

In order to verify this result, I downloaded cfg and weights of CSPResNeXt50-PANet-SPP to compare it with Yolov3 (yolov3.cfg + yolov3.weights - result of COCO training).

As far as I could observe, CSPResNeXt50-PANet-SPP is not better than Yolov3 at least for my case of detecting humans in video streams. Here is an example image of results of both networks:

1) Inference result with CSPResNeXt50-PANet-SPP: CSPResNeXt50-PANet-SPP_detection

2) Inference result with Yolov3: yolov3_detection

My question is that whether these images represent a special case where CSPResNeXt50-PANet-SPP may perform worse than Yolov3? For instance, maybe for small objects like humans in the given images? Or what is the best way to explain this status?

Thanks in advance.

AlexeyAB commented 4 years ago

You have provided too little information. Also you didn't give source image for reproducing this issue. May be you use CSPResNeXt50-PANet-SPP and Yolov3 with different network resolution. Or you are doing something wrong.

  1. attach source image
  2. Show result of detection by using cfg https://raw.githubusercontent.com/AlexeyAB/darknet/master/cfg/csresnext50-panet-spp-original-optimal.cfg weights https://drive.google.com/open?id=1_NnfVgj0EDtb_WLNoXV8Mo7WKgwdYZCc
WongKinYiu commented 4 years ago

the most possible situation is that the default input size of CSPResNeXt50-PANet-SPP is 416 and the default input size of YOLOv3 https://pjreddie.com/darknet/yolo/ is 608 if you just download cfg/weights and test them.

Also, please note that the big bounding box is correct detection since coco dataset will labelled it with iscrowd tag. https://github.com/ultralytics/yolov3/issues/714#issuecomment-565570001 https://github.com/ultralytics/yolov3/issues/714#issuecomment-565657113

ekarabulut commented 4 years ago

Hi @AlexeyAB and @WongKinYiu

As @AlexeyAB suggested, I downloaded the cfg and weights file from the links provided in the post. I used a video for inference and that picture is from the first second of that video. The command I used was: ./darknet detector demo data/coco.data csresnext50-panet-spp-original-optimal.cfg csresnext50-panet-spp-original-optimal_final.weights sys6_day.mp4 -out_filename sys6_day_predictions.mp4 -dont_show

The source video is at this URL: https://drive.google.com/open?id=1vquc2v9jpA5WkOEkvsNqMGCb-5XI7GXw

As a result of this inference, the same frame now looks like this: Screenshot from 2020-01-09 21-55-37

According to this result, Yolov3 inference still looks like better. To me, it is better in most parts of the video not just for this example frame. Any ideas?

AlexeyAB commented 4 years ago

@ekarabulut

Try to run detection with flag -thresh 0.1

./darknet detector demo cfg/coco.data cfg/csresnext50-panet-spp-original-optimal.cfg csresnext50-panet-spp-original-optimal_final.weights sys6_day.mp4 -thresh 0.1


According to this result, Yolov3 inference still looks like better. To me, it is better in most parts of the video not just for this example frame. Any ideas?

It depends on dataset. On MS COCO the model csresnext50-panet-spp-original-optimal.cfg works better than yolov3.cfg, on AP50, AP, APsmall and especially for persons-class:


AlexeyAB commented 4 years ago

@WongKinYiu Did you use darknet.exe detector calc_anchors data/obj.data -num_of_clusters 9 -width 608 -height 608 for recalculating anchors for csresnext50-panet-spp-original-optimal.cfg ? Did you change achors manually?

WongKinYiu commented 4 years ago

@AlexeyAB

No, I just use [original_anchors*512/416].

WongKinYiu commented 4 years ago

@ekarabulut hello,

the reason of https://github.com/WongKinYiu/CrossStagePartialNetworks/issues/7#issuecomment-572707936 is becuz it use larger anchors, so u need use larger input size to detect small objects.

and the reason of https://github.com/WongKinYiu/CrossStagePartialNetworks/issues/7#issue-547045138 is mainly the input size are different. but there are also an another important reason: yolov3 is trained with letter_box while cspresnext50-panet-spp is trained with resize and images in coco usually have larger height than width, so use -letter_box can solve the problem. (The best solution is re-train cspresnext50-panet-spp with letter_box=1.)

after change input size to fit anchor and test with -letter_box, csresnext50-panet-spp-original-optimal gets following results: predictions

Compare with Yolov3: yolov3_detection

rajhlinux commented 1 year ago

2. weights drive.google.com/open?id=1_NnfVgj0EDtb_WLNoXV8Mo7WKgwdYZCc

I get 404 error for the weight: weights drive.google.com/open?id=1_NnfVgj0EDtb_WLNoXV8Mo7WKgwdYZCc