about the training - Githubissues

Usernamezhx commented 3 years ago

I train my own model with the default value you provide. It show me that:

TNTWEN commented 3 years ago

Hello！ I think your s may be too big. Your map is still rising before 50 epoch. It may be due to insufficient basic training, but it has little effect. After 50 epochs, the accuracy has been declining（without any teady or upward trend）, which indicates that s is too large. Normally, only a few epochs are in decline, and then fluctuate in a certain range. So i think you need to reduce the value of s( if your parameter is --s 0.001 ，you could set --s 0.0001 or --s 0.0005).In general, if your map has dropped for about 50 rounds and there is no steady or upward trend, it means that the sparsity is too strong, and you can stop training ，reduce s and start all over again.

By the way, I'd like to know which model is used and how large your dataset is. Thanks！

Usernamezhx commented 3 years ago

thanks for your reply. I will train with s == 0.0001 again. I use AlexeyAB Yolo v4. I have 10000 images and 7 classes.

Usernamezhx commented 3 years ago

hi. I set s to 0.0001 to train the model. but it show me that:

it is very strange. I checked the https://github.com/ultralytics/yolov3/issues but find nothing. I use the weight trained from AlexeyAB yolo v4. when i run the

python train.py --cfg model/yolov4.cfg --data model/sexy.data --weights model/yolov4-best.weights --epochs 300 --batch-size 16 -sr --s 0.0001 --prune 1

it will show me that:

AssertionError: Unsupported fields ['max_delta'] in model/yolov4.cfg. See https://github.com/ultralytics/yolov3/issues/631

So I comment the unsupported filed. and it can run successfully. Does it have anything to do with this？BTW. which version yolov4 did you use? thanks in advance.

TNTWEN commented 3 years ago

I know why your map is still up in the top 50 epochs. Because many details of the training process of Darknet and pytorch are different, but in general, it will not cause too much impact.

All of my training was done in Pytorch. I also recommend that you do all your training in the same code，like https://github.com/TNTWEN/Pruned-OpenVINO-YOLO/tree/main/Pruneyolov3v4 Because https://github.com/ultralytics/yolov3 is updated too fast.

But I think your current training should be normal. You set 300 epochs, the program will reduce the learning rate at 210 and 270 epoch. Although the map is stable now, the sparse process is still running normally. Your map will gradually recover at 210 and 270 epoch.

TNTWEN commented 3 years ago

AssertionError: Unsupported fields ['max_delta'] in model/yolov4.cfg. See https://github.com/ultralytics/yolov3/issues/631

i didn't meet this problem before. I may not be able to determine why this error was triggered

Usernamezhx commented 3 years ago

But I think your current training should be normal. You set 300 epochs, the program will reduce the learning rate at 210 and 270 epoch. Although the map is stable now, the sparse process is still running normally. Your map will gradually recover at 210 and 270 epoch.

but it doesn't work. it didn't change between 210 and 270. so I try to train the yolo v4 with your script https://github.com/TNTWEN/Pruned-OpenVINO-YOLO/tree/main/Pruneyolov3v4 to finish the basic model training. the cfg: Pruneyolov3v4/cfg/yolov4.cfg . and I change the classes=80 and [filters=255] to filters=(classes + 5)x3 in the cfg. the pretrained model from AlexeyAB : https://github.com/AlexeyAB/darknet/releases/download/darknet_yolo_v3_optimal/yolov4.weights. the train commend :

python train.py --cfg cfg/yolov4.cfg --data model/sexy.data --weights model/yolov4.weights --epochs 300 --batch-size 16

without the pruned parameter. it show me that:

it is different from the darknet:

 for conf_thresh = 0.25, precision = 0.73, recall = 0.78, F1-score = 0.76
 for conf_thresh = 0.25, TP = 3274, FP = 1188, FN = 921, average IoU = 56.98 %

 IoU threshold = 50 %, used Area-Under-Curve for each unique Recall
 mean average precision (mAP@0.50) = 0.657398, or 65.74 %

Does it look normal ? thanks in advance.

TNTWEN commented 3 years ago

Could you show me the results of your training after 190 epochs yesterday

Usernamezhx commented 3 years ago

sorry to late reply. because my graph card diver can't work

Could you show me the results of your training after 190 epochs yesterday

so I train the model again with s == 0.0001. it show me that:

TNTWEN commented 3 years ago

There are indeed very few people who try to prune YOLOv4 until now . Some strange problems may occur during training. I also communicated with other users, especially when the training set is relatively small, there will be many problems. My training set reached 50,000, and the map suddenly dropped to 0. Fortunately, it gradually recovered.So the adaptation of pruning-yolov4 needs more testing and improvement. I think you can try the following: 1: Although your map after sparse training has dropped a lot, the degree of sparseness is still good. You may try to prune channels and layers ,and see what extent the model accuracy after pruning can be finetuned

2:The failure of sparse training to achieve the desired results may be affected by many factors (for example, everyone's training set is different). At present, the adaptation level of yolov3/yolov3-spp will be higher, maybe you can also try yolov4-relu. If you are not in a hurry to put it into actual use, and GPU resources are sufficient, you can continue to try yolov4.For example ,use this https://github.com/tanluren/yolov3-channel-and-layer-pruning to prune yolov4 and see if the same problem exists.

Anyway，thank you so much for trying！！！

TNTWEN / Pruned-OpenVINO-YOLO

about the training #4