Closed offchan42 closed 4 years ago
Sorry, I was so late. In order to make a small SSD, you must first understand how SSD was created. I recommend that you read the paper a few more times to grasp the idea (no offense, I had to, too).
In short, you don't need that many output layers since their purpose is to learn the features of objects of different sizes. If your data only has small objects (i.e. ~ image_size / 10), you can only use the output from conv4 and conv7 (or just conv4). You may also want to read more about the receptive field to understand the effect of CNN's depth.
I don't want any VGG or GiganticResNexT-9000 base model. I want a few conv layers and train it with an easy dataset e.g. tracking a PS joystick that cannot deform or change color like a lizard.