the problem of using label_smooth_eps=0.1 for classify?

I used the csdarknet53-omega. cfg file to train the classification network(not detection) on my own dataset. When using label_smooth_eps =0.1, the loss increases gradually, no matter how much learning rate I set. But when not use label_smooth_eps=0.1, the loss gradually converges.I mean, why is that? By the way, I have five categories with an average of 1,800 pictures in each category. How many training times and batch sizes should be set?

WongKinYiu / CrossStagePartialNetworks

the problem of using label_smooth_eps=0.1 for classify? #39