HaojunYu1998 / UltraDet

144 stars 20 forks source link

pred_scores = pred.get("pred_classes").float().sigmoid().numpy() ? #4

Closed imzhangyd closed 8 months ago

imzhangyd commented 9 months ago

Hi, thanks for your wonderful work! When calculate the R@16 in the 'calculate_mmdet_ar' function, one line is as below:pred_scores = pred.get("pred_classes").float().sigmoid().numpy() . I am cofusing about it.

imzhangyd commented 8 months ago

I think maybe it should be pred_scores = (pred.get("objectness_logits").float().sigmoid().numpy())

xjtulyc commented 8 months ago

Please see the line 136 in the file ultrasound_vid/modeling/heads/fast_rcnn.py. In this line, result.pred_classes refers to logits and still needs to undergo a sigmoid operation. It is written this way to maintain consistency with the variable naming convention used in Faster R-CNN.

You can refer to the documentation of the fast_rcnn_inference function for further details:

Call `fast_rcnn_inference_single_image` for all images.

Args:
    boxes (list[Tensor]): A list of Tensors of predicted class-specific or class-agnostic
        boxes for each image. Element i has shape (Ri, K * 4) if doing
        class-specific regression, or (Ri, 4) if doing class-agnostic
        regression, where Ri is the number of predicted objects for image i.
        This is compatible with the output of :meth:`FastRCNNOutputLayers.predict_boxes`.
    scores (list[Tensor]): A list of Tensors of predicted class scores for each image.
        Element i has shape (Ri, K + 1), where Ri is the number of predicted objects
        for image i. Compatible with the output of :meth:`FastRCNNOutputLayers.predict_probs`.
    image_shapes (list[tuple]): A list of (width, height) tuples for each image in the batch.
    score_thresh (float): Only return detections with a confidence score exceeding this
        threshold.
    nms_thresh (float):  The threshold to use for box non-maximum suppression. Value in [0, 1].
    topk_per_image (int): The number of top scoring detections to return. Set < 0 to return
        all detections.

Returns:
    instances: (list[Instances]): A list of N instances, one for each image in the batch,
        that stores the topk most confidence detections.
    kept_indices: (list[Tensor]): A list of 1D tensor of length of N, each element indicates
        the corresponding boxes/scores index in [0, Ri) from the input, for image i.

Please let me know if you need any further assistance with this matter.