Open RustyShackleford73 opened 1 year ago
Hello, I also encountered this problem, have you solved it?
Hi, May I have a look at your lr and loss curve in the tensorboard? My curve look bad. I finetuned a model from yolov7-w6-pose.pt with coco dataset and custom dataset(1/4 percent) , but the results show that bbox prediction is not good enough( single person with multi bbox) , and the learning rate curve looks wrong.
Train loss and val loss also looks not good.
Hyper paras: lr0: 0.01 , lrf: 0.1 # final OneCycleLR learning rate (lr0 * lrf) momentum: 0.937 # SGD momentum/Adam beta1 weight_decay: 0.0005 # optimizer weight decay 5e-4 warmup_epochs: 3.0 # warmup epochs (fractions ok) warmup_momentum: 0.8 # warmup initial momentum warmup_bias_lr: 0.1 # warmup initial bias lr
why lr0 is so big and lr1 is zreo?-_-
你好,我可以看看你在 tensorboard 中的 lr 和 loss 曲线吗?我的曲线看起来很糟糕。 我使用 coco 数据集和自定义数据集(1/4 百分比)从yolov7-w6-pose.pt微调了一个模型,但结果表明 bbox 预测不够好(单人多 bbox),学习率曲线看起来错误的。
train loss 和 val loss 看起来也不太好。
Hyper paras: lr0: 0.01 , lrf: 0.1 # final OneCycleLR 学习率 (lr0 * lrf) momentum: 0.937 # SGD momentum/Adam beta1 weight_decay: 0.0005 # optimizer weight decay 5e-4 warmup_epochs: 3.0 # warmup epochs (fractions ok) warmup_momentum : 0.8 # warmup initial momentum warmup_bias_lr: 0.1 # warmup initial bias lr
为什么lr0那么大而lr1是zreo?-_-
我的lr1也是这样,你的val loss是上升的么?
@gadewegit en, The val loss curve is ascending.
And lr curve
请问对于val loss 升高,有什么解决办法么? 不知道为什么我的lr2会是这样的。[cid:311d3225-318e-4cdc-acb5-12b2fe635aad]
发件人: zay @.> 发送时间: 2023年3月20日 15:34 收件人: WongKinYiu/yolov7 @.> 抄送: gadewegit @.>; Mention @.> 主题: Re: [WongKinYiu/yolov7] val/obj loss and val/box loss keep raising in the training of yolo pose with coco dataset (Issue #1361)
@gadewegithttps://github.com/gadewegit en, The val loss curve is ascending. [image]https://user-images.githubusercontent.com/33301898/226274571-847796b5-6530-4adc-99f5-e80269369646.png
And lr curve [image]https://user-images.githubusercontent.com/33301898/226274697-1dfaf4d8-fa9d-4126-aad9-938716908179.png
― Reply to this email directly, view it on GitHubhttps://github.com/WongKinYiu/yolov7/issues/1361#issuecomment-1475748348, or unsubscribehttps://github.com/notifications/unsubscribe-auth/A6SYEZKWT42LJTGKWXBIANLW5AB6RANCNFSM6AAAAAATPYOOVI. You are receiving this because you were mentioned.Message ID: @.***>
@gadewegit emmm, I'm trying it out ..
@gadewegit emmm, I'm trying it out ..
好的,希望我们保持联系。
@gadewegit emmm, I'm trying it out ..
好的,希望我们保持联系。
@gadewegit Hi, I finetuned a model with coco dataset and custom dataset(1/4 percent,but only simple samples, person background is not similar to the coco data) . So I trained only the head section parameters of model , changed train process (deleted the warm up stage , modified the learning strategy, and decreased the initial value of the learning rate). In addition , according to some issues in TexasInstruments/edgeai-yolov5 , changing kps_loss , increasing scale factor weight in loss function can alleviate the problem. Now the shifted points no longer appears in coco data and customer data.
@gadewegitemmm,我在试试。。
好的,希望我们保持联系。
@gadewegit嗨,我用 coco 数据集和自定义数据集(1/4%,但只有简单的样本,人物背景与 coco 数据不相似)微调了一个模型。所以我只训练了模型的head section参数,改变了训练过程(删除了warm up stage,修改了学习策略,降低了学习率的初始值)。另外,根据TexasInstruments/edgeai-yolov5中的一些问题,改变kps_loss,增加损失函数中的比例因子权重可以缓解问题。现在转移的点不再出现在coco数据和customer数据中。
Oh, I'm glad to hear that you have solved some problems. Do obj loss, box loss, and learning rate curves appear normal? May I have a look at your curve in the tensorboard?
Oh, I'm glad to hear that you have solved some problems. Do obj loss, box loss, and learning rate curves appear normal? May I have a look at your curve in the tensorboard?
@gadewegit This is map@0.5:0.95 curve, due to pre training, the map score is initially convergent. There are 25% custom data (totally different background), I think the model learned from it.
This is the train loss curve. The val loss curve is initially convergent after some epoch.
There is some probleam in the tensorbard lr curve. The curve is right but the value is not corect.
Oh, I'm glad to hear that you have solved some problems. Do obj loss, box loss, and learning rate curves appear normal? May I have a look at your curve in the tensorboard?
@gadewegit This is map@0.5:0.95 curve, due to pre training, the map score is initially convergent. There are 25% custom data (totally different background), I think the model learned from it.
This is the train loss curve. The val loss curve is initially convergent after some epoch.
There is some probleam in the tensorbard lr curve. The curve is right but the value is not corect.
I'm glad to see that your val loss has converged, but my val loss still has problems. Could you give me specific guidance? In addition, our lr curve is different. I hope to get your help. Thanks!
@gadewegit Could you attach your train command line code and train & val curve?