Hello,
Thank you for open sourcing this! Looks really good!
I am training a custom model as well and will upload it on github soon but I had a few questions on the training front.
1) Will training just the heads suffice? I am training the full network so it is very slow but if I start with pre-trained coco, is the performance good for your custom classes?
2) How many images did you train it with?
3) What is the typical time per epoch on your setup to just train the heads? I am training the full network on a single GPU (Nvidia P100 16 GB) with 3 images per gpu/batch size = 3 and it takes roughly 3 seconds per batch (or ~ 1 second per image). I have 70,000 images in my dataset so that would mean ~ 70,000 seconds or ~ 20 hours per epoch. At that rate I can never train it for multiple epochs.. maybe a 10 epochs at most.. Is it much faster to just train the heads?
Hi, thanks a lot for your comment!
For your questions:
Yeah, I think it's better to train the head of network only if we don't have a huge dataset. It's not only about training time, it also protects the model from overfitting. (Less parameters are tuned) And it performs very well on my dataset with 200 images. I even didn't need to adjust learning rate.
400 images with two classes. About 6-8 instances per image.
Yes. It's faster to train only top layers rather than the whole model, especially for you dataset. I think you could try retaining 1 layers, 2 layers... and so on to see the effect. My dataset is not as large as yours so I can't ensure that only training the head will be enough for you.
Hello, Thank you for open sourcing this! Looks really good!
I am training a custom model as well and will upload it on github soon but I had a few questions on the training front. 1) Will training just the
heads
suffice? I am training the full network so it is very slow but if I start with pre-trained coco, is the performance good for your custom classes? 2) How many images did you train it with? 3) What is the typical time per epoch on your setup to just train theheads
? I am training the full network on a single GPU (Nvidia P100 16 GB) with 3 images per gpu/batch size = 3 and it takes roughly 3 seconds per batch (or ~ 1 second per image). I have 70,000 images in my dataset so that would mean ~ 70,000 seconds or ~ 20 hours per epoch. At that rate I can never train it for multiple epochs.. maybe a 10 epochs at most.. Is it much faster to just train theheads
?