Deci-AI / super-gradients

Easily train or fine-tune SOTA computer vision models with one open source training library. The home of Yolo-NAS.
https://www.supergradients.com
Apache License 2.0
4.54k stars 496 forks source link

YoloNAS training time #2056

Open shekarneo opened 5 days ago

shekarneo commented 5 days ago

💡 Your Question

i am training the yolonas key point detection model on a custome dataset with 1000 images and the training time taking 45minutes per epochs, is the model using the original image size or like 640x640. or is this behavior normal

Versions

No response

BloodAxe commented 5 days ago

Depends on many factors - model size, number on workers you have set, your GPU and CPU

shekarneo commented 2 days ago

okay, after training 20 epochs my AP was 62% and AR was 100% and still there are no key points detected.

shekarneo commented 2 days ago

The dataset i have annotated using cvat tool and exported to coco format. these are the below training results.

SUMMARY OF EPOCH 24
├── Train
│   ├── Yolonasposeloss/loss_cls = 2.464
│   │   ├── Epoch N-1      = 2.6261 (↘ -0.1621)
│   │   └── Best until now = 2.6261 (↘ -0.1621)
│   ├── Yolonasposeloss/loss_iou = 0.0
│   │   ├── Epoch N-1      = 0.0    (= 0.0)
│   │   └── Best until now = 0.0    (= 0.0)
│   ├── Yolonasposeloss/loss_dfl = 0.0
│   │   ├── Epoch N-1      = 0.0    (= 0.0)
│   │   └── Best until now = 0.0    (= 0.0)
│   ├── Yolonasposeloss/loss_pose_cls = 0.0
│   │   ├── Epoch N-1      = 0.0    (= 0.0)
│   │   └── Best until now = 0.0    (= 0.0)
│   ├── Yolonasposeloss/loss_pose_reg = 0.0
│   │   ├── Epoch N-1      = 0.0    (= 0.0)
│   │   └── Best until now = 0.0    (= 0.0)
│   └── Yolonasposeloss/loss = 2.464
│       ├── Epoch N-1      = 2.6261 (↘ -0.1621)
│       └── Best until now = 2.6261 (↘ -0.1621)
└── Validation
    ├── Yolonasposeloss/loss_cls = nan
    │   ├── Epoch N-1      = nan    (= nan)
    │   └── Best until now = nan    (= nan)
    ├── Yolonasposeloss/loss_iou = 0.0
    │   ├── Epoch N-1      = 0.0    (= 0.0)
    │   └── Best until now = 0.0    (= 0.0)
    ├── Yolonasposeloss/loss_dfl = 0.0
    │   ├── Epoch N-1      = 0.0    (= 0.0)
    │   └── Best until now = 0.0    (= 0.0)
    ├── Yolonasposeloss/loss_pose_cls = 0.0
    │   ├── Epoch N-1      = 0.0    (= 0.0)
    │   └── Best until now = 0.0    (= 0.0)
    ├── Yolonasposeloss/loss_pose_reg = 0.0
    │   ├── Epoch N-1      = 0.0    (= 0.0)
    │   └── Best until now = 0.0    (= 0.0)
    ├── Yolonasposeloss/loss = nan
    │   ├── Epoch N-1      = nan    (= nan)
    │   └── Best until now = nan    (= nan)
    ├── Ap = 0.5355
    │   ├── Epoch N-1      = 0.5786 (↘ -0.0431)
    │   └── Best until now = 0.8088 (↘ -0.2732)
    └── Ar = 1.0
        ├── Epoch N-1      = 1.0    (= 0.0)
        └── Best until now = 1.0    (= 0.0)

and my yaml file is

num_joints: 12

oks_sigmas: [0.025, 0.025, 0.025, 0.025, 0.025, 0.025, 0.025, 0.072, 0.072, 0.025, 0.025, 0.025]

edge_links:
  - [3,9]
  - [10,2]
  - [0,7]
  - [9,6]
  - [4,0]
  - [10,7]
  - [1,2]
  - [11,5]
  - [4,9]
  - [7,8]
  - [1,11]
  - [11,4]
  - [6,10]
  - [8,3]

edge_colors:
 - [214, 39, 40]  
 - [214, 39, 40] 
 - [214, 39, 40]  
 - [214, 39, 40]  
 - [214, 39, 40]  
 - [214, 39, 40]  
 - [214, 39, 40]
 - [214, 39, 40] 
 - [214, 39, 40] 
 - [214, 39, 40]  
 - [214, 39, 40] 
 - [214, 39, 40]  
 - [214, 39, 40]  
 - [214, 39, 40]  

keypoint_colors:
  - [250, 50, 83]
  - [250, 50, 83]
  - [250, 50, 83]
  - [250, 50, 83]
  - [250, 50, 83]
  - [250, 50, 83]
  - [250, 50, 83]
  - [250, 250, 55]
  - [250, 250, 55]
  - [250, 250, 55]
  - [250, 250, 55]
  - [250, 250, 55]
BloodAxe commented 2 days ago

What worries me in the reported loss - is zero values for pose/bbox regression for loss. Which may indicate there are 0 matches between gt boxes/poses and predicted boxes/poses by a model. Can you attach example image and annotation json that you've exported? I would double-check that there are no export issues in the first place.

You probably aware of, but this notebook shows fine tuning of YoloNAS-Pose on the animals - https://github.com/Deci-AI/super-gradients/blob/master/notebooks/YoloNAS_Pose_Fine_Tuning_Animals_Pose_Dataset.ipynb which is working well. So my best guess for the root cause of your problem is the data.

shekarneo commented 2 days ago

Hi i am not able to share the images here i can share the annotation file and also attaching the python script for training.

yolo_nas_pose_fine_tuning_custom_dataset.py.txt Training.json.txt validation.json.txt

shekarneo commented 2 days ago

and in my case, I don't need or required to have joint connections between the key points

Sriparna2024 commented 2 days ago

Hi, I am also facing similar issue. Please guide on how to export .json file which is compatible with yolonas pose from CVAT tool. Is the .yaml file stated is correctly formatted?