ViTAE-Transformer / ViTPose

The official repo for [NeurIPS'22] "ViTPose: Simple Vision Transformer Baselines for Human Pose Estimation" and [TPAMI'23] "ViTPose++: Vision Transformer for Generic Body Pose Estimation"
Apache License 2.0
1.25k stars 170 forks source link

ViTPose fails to train on a small dataset #109

Open chrisrapson opened 11 months ago

chrisrapson commented 11 months ago

Has anybody else had problems training on a small dataset? I've got one with 60 images in the training set, 10 in the validation set. It gives consistently 0 or -1 for all of the validation results.

I've tried:

In all epochs, the results look like:

 Average Precision  (AP) @[ IoU=0.50:0.95 | area=   all | maxDets= 20 ] =  0.000
 Average Precision  (AP) @[ IoU=0.50      | area=   all | maxDets= 20 ] =  0.000
 Average Precision  (AP) @[ IoU=0.75      | area=   all | maxDets= 20 ] =  0.000
 Average Precision  (AP) @[ IoU=0.50:0.95 | area=medium | maxDets= 20 ] =  0.000
 Average Precision  (AP) @[ IoU=0.50:0.95 | area= large | maxDets= 20 ] =  0.000
 Average Recall     (AR) @[ IoU=0.50:0.95 | area=   all | maxDets= 20 ] =  0.000
 Average Recall     (AR) @[ IoU=0.50      | area=   all | maxDets= 20 ] =  0.000
 Average Recall     (AR) @[ IoU=0.75      | area=   all | maxDets= 20 ] =  0.000
 Average Recall     (AR) @[ IoU=0.50:0.95 | area=medium | maxDets= 20 ] =  0.000
 Average Recall     (AR) @[ IoU=0.50:0.95 | area= large | maxDets= 20 ] =  0.000
2023-07-12 14:05:59,489 - mmpose - INFO - Epoch(val) [10][19]   AP: 0.0000, AP .5: 0.0000, AP .75: 0.0000, AP (M): 0.0000, AP (L): 0.0000, AR: 0.0000, AR .5: 0.0000, AR .75: 0.0000, AR (M): 0.0000, AR (L): 0.0000

or

2023-07-10 19:52:27,032 - mmpose - INFO - Epoch(val) [10][3]    AP: 0.0000, AP .5: 0.0000, AP .75: 0.0000, AP (M): -1.0000, AP (L): 0.0000, AR: 0.0000, AR .5: 0.0000, AR .75: 0.0000, AR (M): -1.0000, AR (L): 0.0000

It looks like ViTPose would usually report training results after 50 batches, but with a dataset this small, there are less than 50 batches per epoch.

Training on the full COCO dataset seems to be running fine. Starting from scratch, the acc_pose slowly increases from ~0.02 to ~0.3 over the first two epochs. Loss shrinks from 0.0022 to 0.0017.

2023-07-10 13:38:16,921 - mmpose - INFO - Epoch [1][50/18727]   lr: 1.175e-06, eta: 27 days, 13:10:10, time: 0.605, data_time: 0.048, memory: 2544, heatmap_loss: 0.0022, acc_pose: 0.0182, loss: 0.0022, grad_norm: 0.0095
...
2023-07-10 19:28:12,419 - mmpose - INFO - Epoch [2][12300/18727]    lr: 1.188e-05, eta: 30 days, 13:49:08, time: 0.650, data_time: 0.001, memory: 2544, heatmap_loss: 0.0017, acc_pose: 0.3288, loss: 0.0017, grad_norm: 0.0058
DuinoDu commented 1 month ago

Your own custom data is also human pose data and num_keypoints is 17, as same order as coco keypoints?

chris-rapson-formus commented 1 month ago

My custom dataset was different, but as I said, I saw this strange behaviour even when training on a subset of the COCO dataset.

Anyway, it's a long time ago now. I managed to train a ViTPose network which had reasonable accuracy, but it turned out that other architectures were better for our use case.