The ClusterDetections layer of DetectNet has a bug when processing images in portrait format. It limits the number of grid cells in the vertical dimension to the number of cells in horizontal dimension, thus ignoring any object (bounding box) that lies outside of the top square crop of the input image.
The ClusterDetections layer of DetectNet has a bug when processing images in portrait format. It limits the number of grid cells in the vertical dimension to the number of cells in horizontal dimension, thus ignoring any object (bounding box) that lies outside of the top square crop of the input image.
The line https://github.com/NVIDIA/caffe/blob/f7801e974f883b97e1b99e9dd23457a6d6cb4b68/python/caffe/layers/detectnet/clustering.py#L153 should be
cvg_val = net_cvg[0, 0:grid_sz_y, 0:grid_sz_x]
instead.