Bing-su / adetailer

Auto detecting, masking and inpainting with detection model.
GNU Affero General Public License v3.0
4.07k stars 315 forks source link

[Bug]: two faces are detected for one face #669

Closed elistys closed 2 weeks ago

elistys commented 1 month ago

Describe the bug

I am having trouble with the problem that two faces are detected for one face and processed twice.

The situation remains the same whether I set the “Detection model confidence threshold” to 0.01 or 1.0.

There are multiple people on the screen, so I cannot set “Mask only the top k largest (0 to disable)” to 1.

I don't know if this is a bug or a problem with accuracy.

Is there any way to improve this?

Environment ADetailer:v24.1.2 mediapipe_face_full A1111 webUI:v1.7.0 M2 mac mini

Steps to reproduce

Use mediapipe_face_full.

Screenshots

001 002

Console logs, from start to end.

mediapipe: 2 detected.
100%|█████████████████████████████████████████████| 9/9 [00:11<00:00,  1.29s/it]
100%|██████████████████████████████████████████████████████████████████████████████████████████| 9/9 [00:09<00:00,  1.08s/it]

List of installed extensions

No response

dosubot[bot] commented 1 month ago

Hello @elistys 👋, I'm here to help you with any bugs, questions, or contributions you have regarding the repository.

To improve face detection accuracy and avoid detecting and processing the same face twice when using the mediapipe_face_full model in ADetailer v24.1.2, you can implement a mechanism to filter out overlapping or duplicate bounding boxes. One effective approach is to use Intersection over Union (IoU) to determine if two bounding boxes overlap significantly and then keep only one of them.

Here is an example of how you can modify the mediapipe_face_detection function to include this logic:

def mediapipe_face_detection(
    model_type: int, image: Image.Image, confidence: float = 0.3, iou_threshold: float = 0.5
) -> PredictOutput[float]:
    import mediapipe as mp

    img_width, img_height = image.size

    mp_face_detection = mp.solutions.face_detection
    draw_util = mp.solutions.drawing_utils

    img_array = np.array(image)

    with mp_face_detection.FaceDetection(
        model_selection=model_type, min_detection_confidence=confidence
    ) as face_detector:
        pred = face_detector.process(img_array)

    if pred.detections is None:
        return PredictOutput()

    preview_array = img_array.copy()

    bboxes = []
    for detection in pred.detections:
        draw_util.draw_detection(preview_array, detection)

        bbox = detection.location_data.relative_bounding_box
        x1 = bbox.xmin * img_width
        y1 = bbox.ymin * img_height
        w = bbox.width * img_width
        h = bbox.height * img_height
        x2 = x1 + w
        y2 = y1 + h

        bboxes.append([x1, y1, x2, y2])

    # Filter out overlapping bounding boxes
    bboxes = filter_overlapping_bboxes(bboxes, iou_threshold)

    masks = create_mask_from_bbox(bboxes, image.size)
    preview = Image.fromarray(preview_array)

    return PredictOutput(bboxes=bboxes, masks=masks, preview=preview)

def filter_overlapping_bboxes(bboxes, iou_threshold):
    def iou(box1, box2):
        x1, y1, x2, y2 = box1
        x1_, y1_, x2_, y2_ = box2

        xi1 = max(x1, x1_)
        yi1 = max(y1, y1_)
        xi2 = min(x2, x2_)
        yi2 = min(y2, y2_)

        inter_area = max(0, xi2 - xi1) * max(0, yi2 - yi1)
        box1_area = (x2 - x1) * (y2 - y1)
        box2_area = (x2_ - x1_) * (y2_ - y1_)

        union_area = box1_area + box2_area - inter_area

        return inter_area / union_area

    filtered_bboxes = []
    for i, bbox in enumerate(bboxes):
        keep = True
        for j in range(i):
            if iou(bbox, bboxes[j]) > iou_threshold:
                keep = False
                break
        if keep:
            filtered_bboxes.append(bbox)

    return filtered_bboxes

In this code, the filter_overlapping_bboxes function calculates the IoU for each pair of bounding boxes and filters out those that overlap significantly based on the specified iou_threshold. This helps in avoiding the detection and processing of the same face multiple times [1].

To continue talking to Dosu, mention @dosu.

Bing-su commented 1 month ago

It's not uncommon for machine learning models to fail to predict. Try sending them to the img2img tab and inpainting them yourself.

elistys commented 1 month ago

Thanks for the reply.

I think this is the limit of accuracy and will give up.