WongKinYiu / yolor

implementation of paper - You Only Learn One Representation: Unified Network for Multiple Tasks (https://arxiv.org/abs/2105.04206)
GNU General Public License v3.0
1.99k stars 521 forks source link

Could you clarify what happens in build_targets() function? #149

Open LaCandela opened 2 years ago

LaCandela commented 2 years ago

Could you explain a bit how does the anchor box to ground truth box maching is performed in utils.loss.build_targets() ?

https://github.com/WongKinYiu/yolor/blob/be7da6eba2f612a15bf462951d3cdde66755a180/utils/loss.py#L152

It seems that somehow the aspect ratio is used but I don't understand how.

Thanks!

jacoblubecki commented 2 years ago

Not the author, but spent an unreasonable amount of time reading through this function trying to understand it so gonna take a stab at an answer, at least regarding the aspect ratio check.

The aspect ratios are calculated from the gain-scaled targets and then filtered based on the anchor_t hyperparameter. The default value is I think 4.0 which means any target which has an aspect ratio that is greater than 4.0 or less than 0.25 is ignored.

If I had to guess, this is for one of a few possible reasons:

  1. The default anchors aren't very useful above/below those thresholds and so trying to train against objects with those shapes might make it harder for the detector to converge to something that generalizes better (i.e. because there aren't a lot of actual objects to classify with such narrow shapes).
  2. Many such shapes are likely caused around boundaries of the image where partial crops led to overly skinny bounding boxes (e.g. when a part of an object is cropped out during a translation or scaling). In such cases, the resulting box may end up not actually containing that much (if any, depending on the label quality) of the target class. In such cases, this would also make model performance worse in many scenarios because the model would see some samples teaching it that the background class maps to a non-background class.
  3. Some other reason that I am not informed enough to understand :)

The larger you set this value, the more tolerant of narrow bounding boxes your training cycle will become. Hopefully that was helpful at least a bit.

ZyrianovS commented 2 years ago

@jacoblubecki @LaCandela https://github.com/ultralytics/yolov5/issues/4158#issuecomment-886627065