Closed zhangkui669 closed 4 years ago
Yes, we use RPN/Fast RCNN-like bounding box coefficient encoding rather than YOLO-like encoding. In practice, we find the RPN/Fast RCNN-like encoding and gt assignment strategy significantly improve the recall in the crowd scenario. If we use the YOLO-like strategy, the bounding boxes will align the gt boxes better, but the recall is low.
This is mainly because more anchors can be assigned to one gt box. In the pedestrian detection/tracking application, recall is more important than accurate localization.
Another question:
When I trained model with my own datasets, I found the loss become negative, and the box loss continuously increased
How to avoid the loss became a negative value?
I meet the same problem: when I train more than one epoch, the loss become negative,could you tell me how to solve this problem? Thank you very much!
@Zhongdao
I meet the same problem: when I train more than one epoch, the loss become negative,could you tell me how to solve this problem? Thank you very much!
Now I use fixed loss weights. Maybe the trainning data is too simple. I expanded my trainning data, and trained with fairmot, which use same weight method, the loss is always possitive.
@zhangkui669 @CharlesYXW That's great, but it's okay for the loss to be negative.
Hi,
When I read your source code about yololayer,
if output of network is xywh, when it transformed to actural xywh, you used the formulation: xy = xy anchor_size + grid_offset stride wh = exp(wh) * anchor_size
But the formulations of raw yolo paper is: xy = sigmoid(xy) stride + grid_offset stride wh = exp(wh) * anchor_size
I have some concern about this when use sigmoid, the raw output is scaled to (0,1), so the center point of detected box, is up to scale with the stride. But in your solution, xy is up to scale with anchor_size, but the anchor size is various
Best regards