Dataset image size is incorrectly used as inference model input size

mkrupczak3 commented 10 months ago

Seems (per @SlicedBacon's review) that we may be accidentally using dataset image size as the inference model input size. This may explain the problems we've encountered in #2 , our dataset images are 512x512 and our yolov8 model has an input size of 640x640. Mismatch of these values could cause det_loss to produce random, invalid results which preclude the possibility of optimization.

https://github.com/KI-1-AI-Sec/adversarial-yolo/blob/36f55a776b82a5cbbfc1b49802f929ff2d0ba0d2/train_patch.py#L107C20-L107C20

mkrupczak3 commented 10 months ago

Tried to switch the input size in the patch trainer from 512 to 640 and it did not go well. Seems something is hard-coded to 512 deep within the code for evaluating various scores and applying the adversarial patch.

It may be easier for us to instead train a 512x512 yolov8 model which matches the input size of the old yolov2 one. This is very easy to do using arguments for yolov8 during training. The only drawback is we cannot transfer learn from yolov8 COCO model since its structure will be different.

If we have a 512x512 model, all that would need to be changed in this codebase would then be to pass ultralytics a config file telling it the non-standard input size and weights configuration.

mkrupczak3 commented 10 months ago

Upon our review, it doesn't seem like this is actually the issue. We will need to do more experiments to determine the root cause of #2

KI-1-AI-Sec / adversarial-yolo

Dataset image size is incorrectly used as inference model input size #3