Accuracy and speed of yolov4x-mish

Goru1890 commented 3 years ago

Which is the improvement with the new function new_coords over traditional yolov4? Did someone try it with COCO?



If you do not get an answer for a long time, try to find the answer among Issues with a Solved label: https://github.com/AlexeyAB/darknet/issues?q=is%3Aopen+is%3Aissue+label%3ASolved

AlexeyAB commented 3 years ago

YOLOv4x-mish - 640x640 - COCO-testdev-2019: 49.4% AP - 67.9% AP50 stdout.txt
- GPU RTX 2070 - 23 FPS
- GPU RTX 3090 - 30 FPS
- GPU V100 - ~50 FPS
YOLOv4x-mish - 672x672 - COCO-testdev-2019: 49.6% AP - 68.1% AP50 stdout.txt, GPU RTX 2070 - 21 FPS, GPU V100 - 45 FPS

So currently it is much better than PP-YOLO, EfficientDet, SpineNet and many other models.

Darknet:

Pytorch: https://github.com/WongKinYiu/PyTorch_YOLOv4

overall performance Average Precision (AP) @[ IoU=0.50:0.95 | area= all | maxDets=100 ] = 0.496 Average Precision (AP) @[ IoU=0.50 | area= all | maxDets=100 ] = 0.681 Average Precision (AP) @[ IoU=0.75 | area= all | maxDets=100 ] = 0.540 Average Precision (AP) @[ IoU=0.50:0.95 | area= small | maxDets=100 ] = 0.307 Average Precision (AP) @[ IoU=0.50:0.95 | area=medium | maxDets=100 ] = 0.537 Average Precision (AP) @[ IoU=0.50:0.95 | area= large | maxDets=100 ] = 0.617 Average Recall (AR) @[ IoU=0.50:0.95 | area= all | maxDets= 1 ] = 0.377 Average Recall (AR) @[ IoU=0.50:0.95 | area= all | maxDets= 10 ] = 0.616 Average Recall (AR) @[ IoU=0.50:0.95 | area= all | maxDets=100 ] = 0.656 Average Recall (AR) @[ IoU=0.50:0.95 | area= small | maxDets=100 ] = 0.454 Average Recall (AR) @[ IoU=0.50:0.95 | area=medium | maxDets=100 ] = 0.700 Average Recall (AR) @[ IoU=0.50:0.95 | area= large | maxDets=100 ] = 0.813 Done (t=835.94s)

AlexeyAB commented 3 years ago

@mive93 Hi, Could you port it to tkDNN / TRT please?

mive93 commented 3 years ago

Hi @AlexeyAB, sure, I can do that. What are the main changes wrt yolov4?

arnaud-nt2i commented 3 years ago

@AlexeyAB Hi ! Is yolov4x-mish ready to train on custom dataset ?

sctrueew commented 3 years ago

@AlexeyAB Hi,

Does OpenCV-dnn also support?

AlexeyAB commented 3 years ago

@arnaud-nt2i

Is yolov4x-mish ready to train on custom dataset ?

I didn't test it well:

Try new_coords=1 if there will be bad results then try to train with new_coords=0
Use pre-trained weights https://github.com/AlexeyAB/darknet/releases/download/darknet_yolo_v4_pre/yolov4x-mish.conv.166
Make sure that batch=64 and subdivisions <= 16
If you will get Nan - you should set max_delta=20 for each [yolo]-layer, and set learning_rate=0.001 for [net]
And I haven't added exponential moving average (EMA) yet.

AlexeyAB commented 3 years ago

@mive93 Hi,

If there is set [yolo] new_coords=1 then:

We use Logistic (sigmoid) not only for x,y, but for x,y,w,h https://github.com/AlexeyAB/darknet/commit/8c9c5171891ea92b0cbf5c7fddf935df0b854540#diff-a191a7d286ab1bacf527ae4b5edfbad6951b06a4d80685393577af64eb8e8a8fR950
The coordinates should be calculated in this way: https://github.com/AlexeyAB/darknet/commit/8c9c5171891ea92b0cbf5c7fddf935df0b854540#diff-a191a7d286ab1bacf527ae4b5edfbad6951b06a4d80685393577af64eb8e8a8fR141-R144

So in total:

x = (logistic(in) * 2 - 0.5 + grid_x) / grid_width
y = ...
w = pow( logistic(in)*2, 2) * anchor / network_width
h = ...

We use nms=0.6 instead of 0.45
We use diounms() https://github.com/AlexeyAB/darknet/commit/c7e3e2ee9ec2d8fff447d83736e13bce0938015f#diff-2c2b9046564ae9ad1ba54f4b42a3c8acbf98af531e411be6281687f6b6689e98L916

AlexeyAB commented 3 years ago

@zpmmehrdad

Does OpenCV-dnn also support?

Currently no, they need the same fixes.

mive93 commented 3 years ago

@AlexeyAB,

thank you, I will come back to you as soon as I have some results.

mive93 commented 3 years ago

Hi @AlexeyAB One question: were the weights computed with new_coords=0? I'm asking because when I convert the weights and create the network the output corresponds if new_coords=0, however when I run the demo new_coords should be 1 to have correct boxes. If that is the case, then I have completed the porting and I can push it.

The mAP on tkDNN is the following (with thresh=0.001 and COCO_val_2017)

overall performance
 Average Precision  (AP) @[ IoU=0.50:0.95 | area=   all | maxDets=100 ] = 0.463
 Average Precision  (AP) @[ IoU=0.50      | area=   all | maxDets=100 ] = 0.645
 Average Precision  (AP) @[ IoU=0.75      | area=   all | maxDets=100 ] = 0.507
 Average Precision  (AP) @[ IoU=0.50:0.95 | area= small | maxDets=100 ] = 0.305
 Average Precision  (AP) @[ IoU=0.50:0.95 | area=medium | maxDets=100 ] = 0.509
 Average Precision  (AP) @[ IoU=0.50:0.95 | area= large | maxDets=100 ] = 0.587
 Average Recall     (AR) @[ IoU=0.50:0.95 | area=   all | maxDets=  1 ] = 0.365
 Average Recall     (AR) @[ IoU=0.50:0.95 | area=   all | maxDets= 10 ] = 0.599
 Average Recall     (AR) @[ IoU=0.50:0.95 | area=   all | maxDets=100 ] = 0.641
 Average Recall     (AR) @[ IoU=0.50:0.95 | area= small | maxDets=100 ] = 0.463
 Average Recall     (AR) @[ IoU=0.50:0.95 | area=medium | maxDets=100 ] = 0.684
 Average Recall     (AR) @[ IoU=0.50:0.95 | area= large | maxDets=100 ] = 0.787
Done (t=175.86s)

Then I can test the performance on the Xavier. My 2080ti is under training, so the performance are a bit degraded, right now I can tell you that FP32 is around 30FPS and FP16 is around 58FPS.

AlexeyAB commented 3 years ago

@mive93

Thanks!

One question: were the weights computed with new_coords=0?

What do you mean? We have to use all these calculations: https://github.com/AlexeyAB/darknet/issues/6987#issuecomment-729206623

I'm asking because when I convert the weights and create the network the output corresponds if new_coords=0, however when I run the demo new_coords should be 1 to have correct boxes. If that is the case, then I have completed the porting and I can push it.

yolov4x-mish.cfg uses new_coords=1 for all [yolo] layers. Do you use new_coords=1 too?

The mAP on tkDNN is the following (with thresh=0.001 and COCO_val_2017)

overall performance Average Precision (AP) @[ IoU=0.50:0.95 | area= all | maxDets=100 ] = 0.463 Average Precision (AP) @[ IoU=0.50 | area= all | maxDets=100 ] = 0.645

Seems to be it is too small. It should be ~50.0% AP and ~68.5% AP50 for COCO2017-val for yolov4x-mish 672x672

mive93 commented 3 years ago

Hi @AlexeyAB,

never mind, I solved the export problem. The issue is that I convert weights and get the debug output for each layer without using the GPU, and new_coords is not implemented for CPU only (maybe you want to change it here https://github.com/AlexeyAB/darknet/blob/master/src/yolo_layer.c#L374).

I am checking now for the mAP loss. Will let you know as soon as I solve it.

AlexeyAB commented 3 years ago

@mive93 Hi, Thanks, I fixed it: https://github.com/AlexeyAB/darknet/commit/d18e22ae1bd46e57618af925b56eb187d4485de9

duynguyen51 commented 3 years ago

@AlexeyAB Hi, I use YOLOv4x-mish config in my own dataset, but avr loss do not change after over 1000 iteration ( I set batch_size=64). Avg loss remain at 100. How can I fix it ? Can I set new_coords = 0 ? Thanks

AlexeyAB commented 3 years ago

@duynguyen51 Loss doesn't matter. Show mAP after 30% of total iterations. And use subdivisions=16 or lower.

duynguyen51 commented 3 years ago

@duynguyen51 Loss doesn't matter. Show mAP after 30% of total iterations. And use subdivisions=16 or lower.

Thanks, let me check the result after those iter.

duynguyen51 commented 3 years ago

@duynguyen51 Loss doesn't matter. Show mAP after 30% of total iterations. And use subdivisions=16 or lower.

chart_yolo_mish

Hi, this is my mAP after 30% max_iter

AlexeyAB commented 3 years ago

@duynguyen51 Can you set max_delta= for different yolo layers, and restart training from 10 000 iterations? ./darknet detector train ... backup/yolov4x-mish_10000.weights

[yolo]
max_delta=20
...

[yolo]
max_delta=5
...

[yolo]
max_delta=2

duynguyen51 commented 3 years ago

@duynguyen51 Can you set max_delta= for different yolo layers, and restart training from 10 000 iterations? ./darknet detector train ... backup/yolov4x-mish_10000.weights
[yolo]
max_delta=20
...

[yolo]
max_delta=5
...

[yolo]
max_delta=2

Thanks, let me try it.

AlexeyAB commented 3 years ago

@duynguyen51 Also set learning_rate=0.001

AlexeyAB commented 3 years ago

@duynguyen51 If it doesn't help - try to set and train

[net] 
try_fix_nan=1

Goru1890 commented 3 years ago

How many Gb does your graphic card need to train it? Doesn't work with my nvidia gtx 2070(11 Gb) using 16 subdivisions.

AlexeyAB commented 3 years ago

@Goru1890 I can train https://github.com/AlexeyAB/darknet/blob/master/cfg/yolov4x-mish.cfg on RTX 3090 - 24 GB VRAM with parameters:

[net]
width=640
height=640
batch=64
subdivisions=8
optimized_memory=1

arnaud-nt2i commented 3 years ago

@AlexeyAB You said :

Make sure that batch=64 and subdivisions <= 16

Is batch 64 really mandatory or it is just to set a mini-batch size minimum (4) Eg: can we set batch=63 and subdivisions=7 or batch=70 subdivisions=7 like in other networks?

AlexeyAB commented 3 years ago

@arnaud-nt2i

Eg: can we set batch=63 and subdivisions=7 or batch=70 subdivisions=7 like in other networks?

Yes, you can. I

arnaud-nt2i commented 3 years ago

ok thanks, Some other questions:

Why not using batch normalize=2 it led to good results (+~0.5 mAP) on my own tests (on traditional Yolo V4 mish) ?
Is letterBox mandatory if (Mean training image ratio) ~= (Network ratio) and of corse : train_network_width * train_obj_width / train_image_width ~= detection_network_width * detection_obj_width / detection_image_width train_network_height * train_obj_height / train_image_height ~= detection_network_height * detection_obj_height / detection_image_height

AlexeyAB commented 3 years ago

Why not using batch normalize=2 it led to good results (+~0.5 mAP) on my own tests (on traditional Yolo V4 mish) ?

batch normalize=2 Sometimes it works better, sometimes worse.

Is letterBox mandatory if (Mean training image ratio) ~= (Network ratio) and of corse : train_network_width train_obj_width / train_image_width ~= detection_network_width detection_obj_width / detection_image_width train_network_height train_obj_height / train_image_height ~= detection_network_height detection_obj_height / detection_image_height

No. letter_box=1 is prefered if aspect ratio different for different images and network resolutions.

toplinuxsir commented 3 years ago

Is same with https://github.com/WongKinYiu/ScaledYOLOv4 ?

toplinuxsir commented 3 years ago

I train for my custom dataset , when iterations go above 1000, caculate mAP for every iteration , Is that normal ? Thanks

Goru1890 commented 3 years ago

No. letter_box=1 is prefered if aspect ratio different for different images and network resolutions.

So if I have in my dataset only images with the same ratio and resolution, may I put letter_box=0 ?

AlexeyAB commented 3 years ago

@toplinuxsir I fixed it.

AlexeyAB commented 3 years ago

@Goru1890 Yes you can.

AlexeyAB commented 3 years ago

@toplinuxsir

Is same with https://github.com/WongKinYiu/ScaledYOLOv4 ?

Yes. https://arxiv.org/abs/2011.08036

toplinuxsir commented 3 years ago

@AlexeyAB I trained for my custom dataset , for yolov4 is normal, but for yolov4x-mish , near 2000 iterations , the avg loss is 1339 and mAP is always 0 Is that normal ? Thanks

 Tensor Cores are disabled until the first 3000 iterations are reached.
 Last accuracy mAP@0.5 = 0.00 %, best = 0.00 % 
 1986: 1518.558960, 1420.304321 avg loss, 0.001000 rate, 7.550015 seconds, 127104 images, 1987.702662 hours left
Loaded: 4.403241 seconds - performance bottleneck on CPU or Disk HDD/SSD
v3 (iou loss, Normalizer: (iou: 0.05, obj: 4.00, cls: 0.50) Region 168 Avg (IOU: 0.390271), count: 401, total_loss = 2855.301758 
v3 (iou loss, Normalizer: (iou: 0.05, obj: 1.00, cls: 0.50) Region 185 Avg (IOU: 0.521089), count: 40, total_loss = 42.997162 
v3 (iou loss, Normalizer: (iou: 0.05, obj: 0.40, cls: 0.50) Region 202 Avg (IOU: 0.647995), count: 2, total_loss = 1.097103 
 total_bbox = 9902341, rewritten_bbox = 0.237358 % 
v3 (iou loss, Normalizer: (iou: 0.05, obj: 4.00, cls: 0.50) Region 168 Avg (IOU: 0.407130), count: 383, total_loss = 2931.999512 
v3 (iou loss, Normalizer: (iou: 0.05, obj: 1.00, cls: 0.50) Region 185 Avg (IOU: 0.543540), count: 38, total_loss = 45.554615 
v3 (iou loss, Normalizer: (iou: 0.05, obj: 0.40, cls: 0.50) Region 202 Avg (IOU: 0.617228), count: 4, total_loss = 0.458375 
 total_bbox = 9902766, rewritten_bbox = 0.237348 % 
v3 (iou loss, Normalizer: (iou: 0.05, obj: 4.00, cls: 0.50) Region 168 Avg (IOU: 0.394439), count: 446, total_loss = 3446.613525 
v3 (iou loss, Normalizer: (iou: 0.05, obj: 1.00, cls: 0.50) Region 185 Avg (IOU: 0.530023), count: 47, total_loss = 45.623585 
v3 (iou loss, Normalizer: (iou: 0.05, obj: 0.40, cls: 0.50) Region 202 Avg (IOU: 0.604356), count: 3, total_loss = 0.951157 
 total_bbox = 9903262, rewritten_bbox = 0.237336 % 
v3 (iou loss, Normalizer: (iou: 0.05, obj: 4.00, cls: 0.50) Region 168 Avg (IOU: 0.382783), count: 752, total_loss = 5725.847656 
v3 (iou loss, Normalizer: (iou: 0.05, obj: 1.00, cls: 0.50) Region 185 Avg (IOU: 0.483980), count: 57, total_loss = 55.402355 
v3 (iou loss, Normalizer: (iou: 0.05, obj: 0.40, cls: 0.50) Region 202 Avg (IOU: 0.632100), count: 4, total_loss = 0.644276 
 total_bbox = 9904075, rewritten_bbox = 0.237367 % 
v3 (iou loss, Normalizer: (iou: 0.05, obj: 4.00, cls: 0.50) Region 168 Avg (IOU: 0.363011), count: 808, total_loss = 5614.538574 
v3 (iou loss, Normalizer: (iou: 0.05, obj: 1.00, cls: 0.50) Region 185 Avg (IOU: 0.551375), count: 63, total_loss = 66.249748 
v3 (iou loss, Normalizer: (iou: 0.05, obj: 0.40, cls: 0.50) Region 202 Avg (IOU: 0.696846), count: 2, total_loss = 1.296881 
 total_bbox = 9904948, rewritten_bbox = 0.237346 % 
v3 (iou loss, Normalizer: (iou: 0.05, obj: 4.00, cls: 0.50) Region 168 Avg (IOU: 0.382413), count: 793, total_loss = 5648.040039 
v3 (iou loss, Normalizer: (iou: 0.05, obj: 1.00, cls: 0.50) Region 185 Avg (IOU: 0.546819), count: 86, total_loss = 109.324928 
v3 (iou loss, Normalizer: (iou: 0.05, obj: 0.40, cls: 0.50) Region 202 Avg (IOU: 0.663197), count: 3, total_loss = 0.766923 
 total_bbox = 9905830, rewritten_bbox = 0.237375 % 
v3 (iou loss, Normalizer: (iou: 0.05, obj: 4.00, cls: 0.50) Region 168 Avg (IOU: 0.397971), count: 605, total_loss = 4430.015137 
v3 (iou loss, Normalizer: (iou: 0.05, obj: 1.00, cls: 0.50) Region 185 Avg (IOU: 0.540062), count: 65, total_loss = 80.512238 
v3 (iou loss, Normalizer: (iou: 0.05, obj: 0.40, cls: 0.50) Region 202 Avg (IOU: 0.653224), count: 5, total_loss = 2.803994 
 total_bbox = 9906505, rewritten_bbox = 0.237389 % 
v3 (iou loss, Normalizer: (iou: 0.05, obj: 4.00, cls: 0.50) Region 168 Avg (IOU: 0.398305), count: 373, total_loss = 2787.349121 
v3 (iou loss, Normalizer: (iou: 0.05, obj: 1.00, cls: 0.50) Region 185 Avg (IOU: 0.558826), count: 40, total_loss = 45.045372 
v3 (iou loss, Normalizer: (iou: 0.05, obj: 0.40, cls: 0.50) Region 202 Avg (IOU: 0.612750), count: 2, total_loss = 0.871407 
 total_bbox = 9906920, rewritten_bbox = 0.237410 %

arnaud-nt2i commented 3 years ago

@AlexeyAB I trained for my custom dataset , for yolov4 is normal, but for yolov4x-mish , near 2000 iterations , the avg loss is 1339 and mAP is always 0 Is that normal ? Thanks

 Tensor Cores are disabled until the first 3000 iterations are reached.
 Last accuracy mAP@0.5 = 0.00 %, best = 0.00 % 
 1986: 1518.558960, 1420.304321 avg loss, 0.001000 rate, 7.550015 seconds, 127104 images, 1987.702662 hours left
Loaded: 4.403241 seconds - performance bottleneck on CPU or Disk HDD/SSD
v3 (iou loss, Normalizer: (iou: 0.05, obj: 4.00, cls: 0.50) Region 168 Avg (IOU: 0.390271), count: 401, total_loss = 2855.301758 
v3 (iou loss, Normalizer: (iou: 0.05, obj: 1.00, cls: 0.50) Region 185 Avg (IOU: 0.521089), count: 40, total_loss = 42.997162 
v3 (iou loss, Normalizer: (iou: 0.05, obj: 0.40, cls: 0.50) Region 202 Avg (IOU: 0.647995), count: 2, total_loss = 1.097103 
 total_bbox = 9902341, rewritten_bbox = 0.237358 % 
v3 (iou loss, Normalizer: (iou: 0.05, obj: 4.00, cls: 0.50) Region 168 Avg (IOU: 0.407130), count: 383, total_loss = 2931.999512 
v3 (iou loss, Normalizer: (iou: 0.05, obj: 1.00, cls: 0.50) Region 185 Avg (IOU: 0.543540), count: 38, total_loss = 45.554615 
v3 (iou loss, Normalizer: (iou: 0.05, obj: 0.40, cls: 0.50) Region 202 Avg (IOU: 0.617228), count: 4, total_loss = 0.458375 
 total_bbox = 9902766, rewritten_bbox = 0.237348 % 
v3 (iou loss, Normalizer: (iou: 0.05, obj: 4.00, cls: 0.50) Region 168 Avg (IOU: 0.394439), count: 446, total_loss = 3446.613525 
v3 (iou loss, Normalizer: (iou: 0.05, obj: 1.00, cls: 0.50) Region 185 Avg (IOU: 0.530023), count: 47, total_loss = 45.623585 
v3 (iou loss, Normalizer: (iou: 0.05, obj: 0.40, cls: 0.50) Region 202 Avg (IOU: 0.604356), count: 3, total_loss = 0.951157 
 total_bbox = 9903262, rewritten_bbox = 0.237336 % 
v3 (iou loss, Normalizer: (iou: 0.05, obj: 4.00, cls: 0.50) Region 168 Avg (IOU: 0.382783), count: 752, total_loss = 5725.847656 
v3 (iou loss, Normalizer: (iou: 0.05, obj: 1.00, cls: 0.50) Region 185 Avg (IOU: 0.483980), count: 57, total_loss = 55.402355 
v3 (iou loss, Normalizer: (iou: 0.05, obj: 0.40, cls: 0.50) Region 202 Avg (IOU: 0.632100), count: 4, total_loss = 0.644276 
 total_bbox = 9904075, rewritten_bbox = 0.237367 % 
v3 (iou loss, Normalizer: (iou: 0.05, obj: 4.00, cls: 0.50) Region 168 Avg (IOU: 0.363011), count: 808, total_loss = 5614.538574 
v3 (iou loss, Normalizer: (iou: 0.05, obj: 1.00, cls: 0.50) Region 185 Avg (IOU: 0.551375), count: 63, total_loss = 66.249748 
v3 (iou loss, Normalizer: (iou: 0.05, obj: 0.40, cls: 0.50) Region 202 Avg (IOU: 0.696846), count: 2, total_loss = 1.296881 
 total_bbox = 9904948, rewritten_bbox = 0.237346 % 
v3 (iou loss, Normalizer: (iou: 0.05, obj: 4.00, cls: 0.50) Region 168 Avg (IOU: 0.382413), count: 793, total_loss = 5648.040039 
v3 (iou loss, Normalizer: (iou: 0.05, obj: 1.00, cls: 0.50) Region 185 Avg (IOU: 0.546819), count: 86, total_loss = 109.324928 
v3 (iou loss, Normalizer: (iou: 0.05, obj: 0.40, cls: 0.50) Region 202 Avg (IOU: 0.663197), count: 3, total_loss = 0.766923 
 total_bbox = 9905830, rewritten_bbox = 0.237375 % 
v3 (iou loss, Normalizer: (iou: 0.05, obj: 4.00, cls: 0.50) Region 168 Avg (IOU: 0.397971), count: 605, total_loss = 4430.015137 
v3 (iou loss, Normalizer: (iou: 0.05, obj: 1.00, cls: 0.50) Region 185 Avg (IOU: 0.540062), count: 65, total_loss = 80.512238 
v3 (iou loss, Normalizer: (iou: 0.05, obj: 0.40, cls: 0.50) Region 202 Avg (IOU: 0.653224), count: 5, total_loss = 2.803994 
 total_bbox = 9906505, rewritten_bbox = 0.237389 % 
v3 (iou loss, Normalizer: (iou: 0.05, obj: 4.00, cls: 0.50) Region 168 Avg (IOU: 0.398305), count: 373, total_loss = 2787.349121 
v3 (iou loss, Normalizer: (iou: 0.05, obj: 1.00, cls: 0.50) Region 185 Avg (IOU: 0.558826), count: 40, total_loss = 45.045372 
v3 (iou loss, Normalizer: (iou: 0.05, obj: 0.40, cls: 0.50) Region 202 Avg (IOU: 0.612750), count: 2, total_loss = 0.871407 
 total_bbox = 9906920, rewritten_bbox = 0.237410 %

Same but IOU is nan...

Goru1890 commented 3 years ago

Same but IOU is nan...

Same issue...

arnaud-nt2i commented 3 years ago

Did Someone tried with "try_fix_nan=1" as well ?

arnaud-nt2i commented 3 years ago

@AlexeyAB One funny thing I have encountered while trying Yolov4x with your very exact parameters on RTX 3090: Model does not fit into memory "CUDA out of memory" unless I set optimize_memory=0 !! config: W10 CUDA-version: 11010 (11010), cuDNN: 8.0.5, CUDNN_HALF=1, GPU count: 1 CUDNN_HALF=1 OpenCV version: 4.5.0 Prepare additional network for mAP calculation... 0 : compute_capability = 860, cudnn_half = 1, GPU: GeForce RTX 3090 net.optimized_memory = 0 mini_batch = 1, batch = 8, time_steps = 1, train = 0

There seems to be a starting spick in memory usage when optimize_memory=1 that makes the model crash. Even if long-term the memory usage is lower than when optimize_memory=0

AlexeyAB commented 3 years ago

@arnaud-nt2i Thanks for notice!

Does anyone else have the same problem? So should I set [net] optimized_memory=0 by default?

arnaud-nt2i commented 3 years ago

@toplinuxsir @Goru1890 @duynguyen51 A lot has been done by AlexeyAB trying to fix yolov4x since our bug reports. Has somebody tried the latest fix ?

edit: I have found my answer here: https://github.com/WongKinYiu/ScaledYOLOv4/issues/13#issuecomment-739624075

The last commit seems fine, I will try it and report here.

Goru1890 commented 3 years ago

The last commit seems fine, I will try it and report here.

How did it go?

toplinuxsir commented 3 years ago

@arnaud-nt2i
https://github.com/opencv/opencv/issues/18975#issuecomment-740233812

toplinuxsir commented 3 years ago

@AlexeyAB , I tried the last commit for yolov4 and yolov4x-mish, some strange thing :

Both have higher mAP and higher avg loss
Although have higher mAP ,but both detection results have more missed than before. Is that normal ?

OkuChou commented 3 years ago

@arnaud-nt2i Thanks for notice!

Does anyone else have the same problem? So should I set [net] optimized_memory=0 by default?

Yes, i got same error. I also used rtx3090, i can only run properly when optimized_memory=0. However, try_fix_nan=1 works! And set last 3 [yolo] layer max_delta=20 After followed your indication, "-nan" error disappeared. The situation so far so good...

AlexeyAB commented 3 years ago

@mive93 Hi, Please, fix tkDNN for yolov4-csp and yolov4x-mish models: Currently, if there is new_coord=1, then [yolo] shouldn't use logistic (sigomid) activation for any values. Because activation=logistic now is used in the previous convolutional layer: https://github.com/AlexeyAB/darknet/blob/e7d029c11986156b818690bd1375a54286b8f315/cfg/yolov4x-mish.cfg#L1408-L1436

mive93 commented 3 years ago

Hi @AlexeyAB Sorry, I saw the comment only now (was submitting my phD thesis and had no time to breath). I will look into that in the following days.

mive93 commented 3 years ago

Hi @AlexeyAB, Scaled yolo4 is now supported, and I have also updated Yolov4x-mish (https://github.com/ceccocats/tkDNN/commit/adac8576b0faf515ad3f459b1f50fd16cef6d64d).

However, I think that in your new implementation of the Yolo layer could have problems with Yolov4. You have the scale add at the end, but it should not be there (https://github.com/AlexeyAB/darknet/blob/master/src/yolo_layer.c#L674):

            if (l.new_coords) {
                //activate_array(l.output + bbox_index, 4 * l.w*l.h, LOGISTIC);    // x,y,w,h
            }
            else {
                activate_array(l.output + bbox_index, 2 * l.w*l.h, LOGISTIC);        // x,y,
                int obj_index = entry_index(l, b, n*l.w*l.h, 4);
                activate_array(l.output + obj_index, (1 + l.classes)*l.w*l.h, LOGISTIC);
            }
            scal_add_cpu(2 * l.w*l.h, l.scale_x_y, -0.5*(l.scale_x_y - 1), l.output + bbox_index, 1);    // scale x,y

I think my solution is better (tested with all older models and works for everything) (https://github.com/ceccocats/tkDNN/blob/master/src/Yolo.cpp#L91)

            if (new_coords == 1){
                if (this->scaleXY != 1) scalAdd(dstData + index, 2 * dim.w*dim.h, this->scaleXY, -0.5*(this->scaleXY - 1), 1);
            }
            else{
                activationLOGISTICForward(srcData + index, dstData + index, 2*dim.w*dim.h);

                if (this->scaleXY != 1) scalAdd(dstData + index, 2 * dim.w*dim.h, this->scaleXY, -0.5*(this->scaleXY - 1), 1);

                index = entry_index(b, n*dim.w*dim.h, 4, classes, input_dim, output_dim);
                activationLOGISTICForward(srcData + index, dstData + index, (1+classes)*dim.w*dim.h);
            }

AlexeyAB commented 3 years ago

@mive93 Hi, My implementation is equal to this one in your repo (note that I use int bbox_index instead of int index for scalAdd):

in this case we will not confuse index with bbox_index

if we will want to change x,y-scaling, we should change only 1 line instead of 2 lines

        int bbox_index = entry_index(b, n*dim.w*dim.h, 0, classes, input_dim, output_dim);
        std::cout<<"new_coords"<<new_coords<<std::endl;
        if (new_coords == 1){
            // nothing
        }
        else{
            activationLOGISTICForward(srcData + bbox_index, dstData + bbox_index, 2*dim.w*dim.h);

            int obj_cls_index = entry_index(b, n*dim.w*dim.h, 4, classes, input_dim, output_dim);
            activationLOGISTICForward(srcData + obj_cls_index , dstData + obj_cls_index , (1+classes)*dim.w*dim.h);
        }
        if (this->scaleXY != 1) scalAdd(dstData + bbox_index, 2 * dim.w*dim.h, this->scaleXY, -0.5*(this->scaleXY - 1), 1);

mive93 commented 3 years ago

Yeah you are right, sorry my bad.

AlexeyAB / darknet

Accuracy and speed of yolov4x-mish #6987