Why val/loss was 0? - Githubissues

WongKinYiu / yolov9

Implementation of paper - YOLOv9: Learning What You Want to Learn Using Programmable Gradient Information

GNU General Public License v3.0

8.59k stars 1.31k forks source link

Why val/loss was 0? #379

Open Godk02 opened 2 months ago

Godk02 commented 2 months ago

I am confusused about the result.png. ![Uploading results.png…]()

jasfa commented 2 months ago

I have the same problem, has anyone solved it？

Godk02 commented 2 months ago

I have the same problem, has anyone solved it？

maybe just a bug

jasfa commented 2 months ago

I solved it.

Godk02 commented 2 months ago

I solved it.

thanks

shahriarahmadf commented 2 months ago

What do you mean by solved it? What did you use for training instead of train_dual.py ? Well, I can indeed see validation results using val_dual.py but still I don't get the results.jpg after training where I can see both train loss and val loss (to evaluate overfitting/underfitting). What's your solution?

Godk02 commented 2 months ago

What do you mean by solved it? What did you use for training instead of train_dual.py ? Well, I can indeed see validation results using val_dual.py but still I don't get the results.jpg after training where I can see both train loss and val loss (to evaluate overfitting/underfitting). What's your solution?

He means the author said that the val/loss calculations in trian_dual.py are turned off, not really solving the problem.

shahriarahmadf commented 2 months ago

Thanks

Laocaifeng666 commented 2 months ago

What do you mean by solved it? What did you use for training instead of train_dual.py ? Well, I can indeed see validation results using val_dual.py but still I don't get the results.jpg after training where I can see both train loss and val loss (to evaluate overfitting/underfitting). What's your solution?

I think his intention is to uncomment the code that was commented out in the author's reply; But I directly uncomment computeloss in segment-val_dual.py, it will result in an error. because I just do segment and I don't try change yolov9-val_dual.py Directly, I don't know whether it works.

tomyvazquez commented 1 month ago

If you are using train_dual.py you will see that val loss values are always 0. To correct it, you have to change some things in val_dual.py, at line 189. It should be like this:


        # Inference
        with dt[1]:
            preds = model(im) if compute_loss else (model(im, augment=augment), None)

        # Loss
        if compute_loss:
            # preds = preds[1]
            # train_out = train_out[1]
            loss += compute_loss(preds, targets)[1]  # box, obj, cls
        else:
            preds = preds[0][1]

shahriarahmadf commented 1 month ago

Keeping train_out commented, the code still gives an error. If you could finish your training with no error with your modification to code, can you share your val_dual.py file in your github by sharing the link?

tomyvazquez commented 1 month ago

I commented train_out because you don't need it anymore, since I'm using preds to calculate de loss. With those corrections in val_dual.py you can train with no errors.

shahriarahmadf commented 1 month ago

I commented train_out because you don't need it anymore, since I'm using preds to calculate de loss. With those corrections in val_dual.py you can train with no errors.

Can you please share your val_dual.py file from your github? I tried your technique but seems it shows some error during training and stops.

bigguaner commented 1 month ago

I commented train_out because you don't need it anymore, since I'm using preds to calculate de loss. With those corrections in val_dual.py you can train with no errors.

I just did as your corrections but it shows some error during training
in non_max_suppression device = prediction.device AttributeError: 'list' object has no attribute 'device'

WEllin06 commented 1 month ago

我成功了！！！你超棒！i success！

mo-lx commented 1 month ago

我成功了!!你超棒！i success！

How can I modify the code to successfully restore validation losses to normal?

WEllin06 commented 1 month ago

If you are using train_dual.py you will see that val loss values are always 0. To correct it, you have to change some things in val_dual.py, at line 189. It should be like this:
        # Inference
        with dt[1]:
            preds = model(im) if compute_loss else (model(im, augment=augment), None)

        # Loss
        if compute_loss:
            # preds = preds[1]
            # train_out = train_out[1]
            loss += compute_loss(preds, targets)[1]  # box, obj, cls
        else:
            preds = preds[0][1]
by doing this,you can cope with it well. Just change the val.py's code like the above!And subsequently,you can run the train_dual.py.Ultimately, you can browse the result.csv File, you will find the value from 0 to a positive number. With your continous training,the loss of val will be descent!