Is there a way to train on a fixed scale?

Hi there,

I'm working on a project where I need to detect objects where the size of them is important. I'll make a counter to check if I have the right amount of washers, screws and other little things, but the problem is I need to differentiate how much of each size are there.

For example: I need three 5mm washers and two 4mm washers. I tried training it on a fixed camera distance, but Yolo recognizes it as being both the classes (5mm and 6mm). I also tried using other object detection methods like template matching, but either they are rotation sensitive, or if I use rotation invariant ones they are extremely slow and can't be run in real time.

Is it possible to limit the training to only one layer, so it uses a fixed scale, then I'll be able to differentiate a 4mm washer from a 5mm washer, M5 screw from M6 one etc? If it's possible, how do I do it?

WongKinYiu / yolov7

Is there a way to train on a fixed scale? #1010