Are there any adjustments for training on "minitrain" compared to "train2017"? - Githubissues

giddyyupp / coco-minitrain

a subset of coco dataset for faster experimentation

236 stars 34 forks source link

Are there any adjustments for training on "minitrain" compared to "train2017"? #24

Closed knn217 closed 1 year ago

knn217 commented 1 year ago

Hello, are the models mentioned on the github page trained with the exact same methods, params, epochs,... to their original dataset? The only difference is the training set itself right? (from "train2017" to "minitrain")

I'm asking this because I'm currently training Deformable DETR on "minitrain" for my university final project, 26 epochs in (maximum is 50 epochs) and all the APs are still at 0. The model seems to peform quite well on the original dataset (I even checked with "val2017"). I kept all the config the same to the original model (I should mention that I divided "minitrain" into 5 folders, 5000 images each since google colab encounter error with drive if there are too many files in 1 folder, I think this shouldn't affect the result).

So did you guys encounter this issue as well in your training? Is this normal and I should wait until 50 epochs, or is there something that I'm supposed to change that I'm not aware of? ("minitrain" and annotations are from the shared file and val set is just "val2017")

giddyyupp commented 1 year ago

Hi, We have not changed any parameters for training models using minitrain dataset. looks like there is a problem. If you share the logs maybe we could help you to locate the problem.

knn217 commented 1 year ago

Sorry about this, please wait for me to retrain the model, I deleted the output files and logs to make space for the full COCO_2017 training set.

Do I need to train it back to 26 epochs? Since even at the early epochs, the APs remain at 0 and doesn't improve. Some of the ARs can improve but couldn't exceed 0.2, the rest remain at 0 like the APs

knn217 commented 1 year ago

Ok, 6 epochs done, here's the link to the log file: https://drive.google.com/file/d/1-Dv4kpCgciUHr4Ig3iRYlsf6kyJgT83W/view?usp=sharing

Here's the result on val2017:

epoch 0: Average Precision (AP) @[ IoU=0.50:0.95 | area= all | maxDets=100 ] = 0.000 Average Precision (AP) @[ IoU=0.50 | area= all | maxDets=100 ] = 0.000 Average Precision (AP) @[ IoU=0.75 | area= all | maxDets=100 ] = 0.000 Average Precision (AP) @[ IoU=0.50:0.95 | area= small | maxDets=100 ] = 0.000 Average Precision (AP) @[ IoU=0.50:0.95 | area=medium | maxDets=100 ] = 0.000 Average Precision (AP) @[ IoU=0.50:0.95 | area= large | maxDets=100 ] = 0.000 Average Recall (AR) @[ IoU=0.50:0.95 | area= all | maxDets= 1 ] = 0.000 Average Recall (AR) @[ IoU=0.50:0.95 | area= all | maxDets= 10 ] = 0.001 Average Recall (AR) @[ IoU=0.50:0.95 | area= all | maxDets=100 ] = 0.001 Average Recall (AR) @[ IoU=0.50:0.95 | area= small | maxDets=100 ] = 0.000 Average Recall (AR) @[ IoU=0.50:0.95 | area=medium | maxDets=100 ] = 0.000 Average Recall (AR) @[ IoU=0.50:0.95 | area= large | maxDets=100 ] = 0.003

epoch 1: Average Precision (AP) @[ IoU=0.50:0.95 | area= all | maxDets=100 ] = 0.000 Average Precision (AP) @[ IoU=0.50 | area= all | maxDets=100 ] = 0.000 Average Precision (AP) @[ IoU=0.75 | area= all | maxDets=100 ] = 0.000 Average Precision (AP) @[ IoU=0.50:0.95 | area= small | maxDets=100 ] = 0.000 Average Precision (AP) @[ IoU=0.50:0.95 | area=medium | maxDets=100 ] = 0.000 Average Precision (AP) @[ IoU=0.50:0.95 | area= large | maxDets=100 ] = 0.000 Average Recall (AR) @[ IoU=0.50:0.95 | area= all | maxDets= 1 ] = 0.001 Average Recall (AR) @[ IoU=0.50:0.95 | area= all | maxDets= 10 ] = 0.002 Average Recall (AR) @[ IoU=0.50:0.95 | area= all | maxDets=100 ] = 0.003 Average Recall (AR) @[ IoU=0.50:0.95 | area= small | maxDets=100 ] = 0.000 Average Recall (AR) @[ IoU=0.50:0.95 | area=medium | maxDets=100 ] = 0.000 Average Recall (AR) @[ IoU=0.50:0.95 | area= large | maxDets=100 ] = 0.006

epoch 2: Average Precision (AP) @[ IoU=0.50:0.95 | area= all | maxDets=100 ] = 0.000 Average Precision (AP) @[ IoU=0.50 | area= all | maxDets=100 ] = 0.000 Average Precision (AP) @[ IoU=0.75 | area= all | maxDets=100 ] = 0.000 Average Precision (AP) @[ IoU=0.50:0.95 | area= small | maxDets=100 ] = 0.000 Average Precision (AP) @[ IoU=0.50:0.95 | area=medium | maxDets=100 ] = 0.000 Average Precision (AP) @[ IoU=0.50:0.95 | area= large | maxDets=100 ] = 0.000 Average Recall (AR) @[ IoU=0.50:0.95 | area= all | maxDets= 1 ] = 0.002 Average Recall (AR) @[ IoU=0.50:0.95 | area= all | maxDets= 10 ] = 0.002 Average Recall (AR) @[ IoU=0.50:0.95 | area= all | maxDets=100 ] = 0.003 Average Recall (AR) @[ IoU=0.50:0.95 | area= small | maxDets=100 ] = 0.000 Average Recall (AR) @[ IoU=0.50:0.95 | area=medium | maxDets=100 ] = 0.000 Average Recall (AR) @[ IoU=0.50:0.95 | area= large | maxDets=100 ] = 0.005

epoch 3: Average Precision (AP) @[ IoU=0.50:0.95 | area= all | maxDets=100 ] = 0.000 Average Precision (AP) @[ IoU=0.50 | area= all | maxDets=100 ] = 0.000 Average Precision (AP) @[ IoU=0.75 | area= all | maxDets=100 ] = 0.000 Average Precision (AP) @[ IoU=0.50:0.95 | area= small | maxDets=100 ] = 0.000 Average Precision (AP) @[ IoU=0.50:0.95 | area=medium | maxDets=100 ] = 0.000 Average Precision (AP) @[ IoU=0.50:0.95 | area= large | maxDets=100 ] = 0.000 Average Recall (AR) @[ IoU=0.50:0.95 | area= all | maxDets= 1 ] = 0.002 Average Recall (AR) @[ IoU=0.50:0.95 | area= all | maxDets= 10 ] = 0.003 Average Recall (AR) @[ IoU=0.50:0.95 | area= all | maxDets=100 ] = 0.003 Average Recall (AR) @[ IoU=0.50:0.95 | area= small | maxDets=100 ] = 0.000 Average Recall (AR) @[ IoU=0.50:0.95 | area=medium | maxDets=100 ] = 0.000 Average Recall (AR) @[ IoU=0.50:0.95 | area= large | maxDets=100 ] = 0.008

epoch 4: Average Precision (AP) @[ IoU=0.50:0.95 | area= all | maxDets=100 ] = 0.000 Average Precision (AP) @[ IoU=0.50 | area= all | maxDets=100 ] = 0.000 Average Precision (AP) @[ IoU=0.75 | area= all | maxDets=100 ] = 0.000 Average Precision (AP) @[ IoU=0.50:0.95 | area= small | maxDets=100 ] = 0.000 Average Precision (AP) @[ IoU=0.50:0.95 | area=medium | maxDets=100 ] = 0.000 Average Precision (AP) @[ IoU=0.50:0.95 | area= large | maxDets=100 ] = 0.000 Average Recall (AR) @[ IoU=0.50:0.95 | area= all | maxDets= 1 ] = 0.002 Average Recall (AR) @[ IoU=0.50:0.95 | area= all | maxDets= 10 ] = 0.003 Average Recall (AR) @[ IoU=0.50:0.95 | area= all | maxDets=100 ] = 0.003 Average Recall (AR) @[ IoU=0.50:0.95 | area= small | maxDets=100 ] = 0.000 Average Recall (AR) @[ IoU=0.50:0.95 | area=medium | maxDets=100 ] = 0.000 Average Recall (AR) @[ IoU=0.50:0.95 | area= large | maxDets=100 ] = 0.007

epoch 5: Average Precision (AP) @[ IoU=0.50:0.95 | area= all | maxDets=100 ] = 0.000 Average Precision (AP) @[ IoU=0.50 | area= all | maxDets=100 ] = 0.000 Average Precision (AP) @[ IoU=0.75 | area= all | maxDets=100 ] = 0.000 Average Precision (AP) @[ IoU=0.50:0.95 | area= small | maxDets=100 ] = 0.000 Average Precision (AP) @[ IoU=0.50:0.95 | area=medium | maxDets=100 ] = 0.000 Average Precision (AP) @[ IoU=0.50:0.95 | area= large | maxDets=100 ] = 0.000 Average Recall (AR) @[ IoU=0.50:0.95 | area= all | maxDets= 1 ] = 0.005 Average Recall (AR) @[ IoU=0.50:0.95 | area= all | maxDets= 10 ] = 0.006 Average Recall (AR) @[ IoU=0.50:0.95 | area= all | maxDets=100 ] = 0.006 Average Recall (AR) @[ IoU=0.50:0.95 | area= small | maxDets=100 ] = 0.000 Average Recall (AR) @[ IoU=0.50:0.95 | area=medium | maxDets=100 ] = 0.000 Average Recall (AR) @[ IoU=0.50:0.95 | area= large | maxDets=100 ] = 0.011

Is this normal with other models?

knn217 commented 1 year ago

My bad, the should've checked the results in the logs since they aren't rounded. The model seems to be converging, just really slowly, I'll mess with the learning rates to see if there are any improvements.

knn217 commented 1 year ago

Update on the situation: The original Deformable DETR was trained with batch size of 32 and learning rate of 2e-4. Since my hardware can only train with a batch size of 2, the learning rate was supposed to be (2e-4)/16 or (2e-4)/4 according to the theory (either k or sqrt(k) with k = 32/2). I'm training with (2e-4)/16 right now, 1 epoch in and the result already looks promissing:

Average Precision (AP) @[ IoU=0.50:0.95 | area= all | maxDets=100 ] = 0.010 Average Precision (AP) @[ IoU=0.50 | area= all | maxDets=100 ] = 0.038 Average Precision (AP) @[ IoU=0.75 | area= all | maxDets=100 ] = 0.003 Average Precision (AP) @[ IoU=0.50:0.95 | area= small | maxDets=100 ] = 0.008 Average Precision (AP) @[ IoU=0.50:0.95 | area=medium | maxDets=100 ] = 0.015 Average Precision (AP) @[ IoU=0.50:0.95 | area= large | maxDets=100 ] = 0.013 Average Recall (AR) @[ IoU=0.50:0.95 | area= all | maxDets= 1 ] = 0.027 Average Recall (AR) @[ IoU=0.50:0.95 | area= all | maxDets= 10 ] = 0.076 Average Recall (AR) @[ IoU=0.50:0.95 | area= all | maxDets=100 ] = 0.097 Average Recall (AR) @[ IoU=0.50:0.95 | area= small | maxDets=100 ] = 0.016 Average Recall (AR) @[ IoU=0.50:0.95 | area=medium | maxDets=100 ] = 0.087 Average Recall (AR) @[ IoU=0.50:0.95 | area= large | maxDets=100 ] = 0.150

Hope this will be helpful if anyone else run into the same issue, make sure to check the github page to see training params as well. I just went with the default params which was set to: batch size of 2 and learning rate of 2e-4