Deci-AI / super-gradients

Easily train or fine-tune SOTA computer vision models with one open source training library. The home of Yolo-NAS.
https://www.supergradients.com
Apache License 2.0
4.59k stars 509 forks source link

Epoch number #1085

Open Rohinivv96 opened 1 year ago

Rohinivv96 commented 1 year ago

💡 Your Question

I have initiated the training process for YOLO-NAS using a customized dataset. However, I am currently experiencing difficulty in displaying the ongoing epoch number. Please advise on what modifications I should make in order to enable the display of epoch number during the training process.

Capture

Versions

No response

BloodAxe commented 1 year ago

What output is this? Stdout? Logfile?

Usually during training you should be seeing nice output with loss, epoch number, metrics, etc in your console:

===========================================================
Train epoch 33: 100%|██████████| 250/250 [05:10<00:00,  1.24s/it, DEKRLoss/heatmap=0.00039, DEKRLoss/offset=0.000749, DEKRLoss/total=0.00114, gpu_mem=13.4]
Validation epoch 33: 100%|██████████| 20/20 [00:24<00:00,  1.23s/it]
===========================================================
SUMMARY OF EPOCH 33
├── Training
│   ├── Dekrloss/heatmap = 0.0004
│   │   ├── Best until now = 0.0004 (↘ -0.0)
│   │   └── Epoch N-1      = 0.0004 (↘ -0.0)
│   ├── Dekrloss/offset = 0.0007
│   │   ├── Best until now = 0.0007 (↗ 0.0)
│   │   └── Epoch N-1      = 0.0007 (↗ 0.0)
│   └── Dekrloss/total = 0.0011
│       ├── Best until now = 0.0011 (↘ -0.0)
│       └── Epoch N-1      = 0.0011 (↘ -0.0)
└── Validation
    ├── Ap = 0.351
    │   ├── Best until now = 0.3775 (↘ -0.0265)
    │   └── Epoch N-1      = 0.3705 (↘ -0.0195)
    ├── Ar = 0.4672
    │   ├── Best until now = 0.4898 (↘ -0.0226)
    │   └── Epoch N-1      = 0.4898 (↘ -0.0226)
    ├── Dekrloss/heatmap = 0.0003
    │   ├── Best until now = 0.0002 (↗ 0.0)
    │   └── Epoch N-1      = 0.0002 (↗ 0.0)
    ├── Dekrloss/offset = 0.0008
    │   ├── Best until now = 0.0007 (↗ 0.0)
    │   └── Epoch N-1      = 0.0007 (↗ 0.0)
    └── Dekrloss/total = 0.001
        ├── Best until now = 0.001  (↗ 0.0001)
        └── Epoch N-1      = 0.001  (↗ 0.0001)
BloodAxe commented 1 year ago

@Rohinivv96 could this be related to your issue? https://github.com/Deci-AI/super-gradients/issues/1082#issuecomment-1561991026

Rohinivv96 commented 1 year ago

@BloodAxe Yes. I ran following command for training, trainer.train(model=model, training_params=train_params, train_loader=train_data, valid_loader=val_data) But, I am not getting output on my console as you have shown above. I am just getting as follows: image

BloodAxe commented 1 year ago

Is it stdout or log file? Colab or launch from command line? DDP or single GPU? Linux or Windows? Please provide as much information as possible at once, since it's hard to guess in what environment you're doing your experiment and counter-productive to go back and forth with each question.

harpreetsahota204 commented 1 year ago

Hi @Rohinivv96, have you tried to set the value of silent_mode to False in the train_params? That should print all the information for you.

BloodAxe commented 1 year ago

Related issue https://github.com/Deci-AI/super-gradients/issues/1289