Open johannes-tum opened 3 months ago
Guten Tag!
Thank you for mentioning the issue and providing your solution. I believe this situation is caused by a bug in the loss calculation. I forgot to detach the predicted tensor when the BoxMatcher was finding the corresponding bbox. This occurs in https://github.com/WongKinYiu/YOLO/blob/868c821de803cf5cfbf3e5d7d48571fc3015616e/yolo/tools/loss_functions.py#L91
I have fixed these bugs in commit 4775b4c6b1040e41ab38fe35a51099dcb9299417, but I'm not entirely sure if everything is resolved. I tried training the model on a small dataset, and it seems to be working correctly now. However, some data augmentations are still under development.
I strongly recommend training via the YOLOv9 origin repo to avoid wasting GPU resources. I will release version 1.0 after most of the code is completed.
Mit freundlichen Grüßen, Henry Tsui
All right! Thanks! I know these things are hard to predict, but do you have a rough time frame in mind when v1 might be ready?
Issue Description I tried to train the network on another private dataset. I started with overfitting on a single image. I noticed that a lot of optimizer steps are skipped, because of invalid gradients. As a consequence, the network did not really converge in even 500 epochs. Once I added this block
after self.scaler.scale(loss).backward() it worked better. But I guess there must be a better way than this.