cubiq / ComfyUI_IPAdapter_plus

GNU General Public License v3.0
3.93k stars 296 forks source link

Insightface "no faces detected" simple solution #165

Closed IntendedConsequence closed 9 months ago

IntendedConsequence commented 9 months ago

Solves 95% of faces too close not being detected for me.

import numpy as np #for np.ndarray type annotation

class FaceAnalysis2(FaceAnalysis):
    # NOTE: allows setting det_size for each detection call.
    # the model allows it but the wrapping code from insightface
    # doesn't show it, and people end up loading duplicate models
    # for different sizes where there is absolutely no need to
    def get(self, img, max_num=0, det_size=(640, 640)):
        if det_size is not None:
            self.det_model.input_size = det_size

        return super().get(img, max_num)

def analyze_faces(face_analysis: FaceAnalysis, img_data: np.ndarray, det_size=(640, 640)):
    # NOTE: try detect faces, if no faces detected, lower det_size until it does
    detection_sizes = [None] + [(size, size) for size in range(640, 256, -64)] + [(256, 256)]

    for size in detection_sizes:
        faces = face_analysis.get(img_data, det_size=size)
        if len(faces) > 0:
            return faces

    return []
cubiq commented 9 months ago

this is smart but gradually lowers the resolution of the face. It's better to return an error and let the user find a better picture.

IntendedConsequence commented 9 months ago

@cubiq

this is smart but gradually lowers the resolution of the face. It's better to return an error and let the user find a better picture.

It does lower the resolution of the face, but only for face detection model. The output of the face detection model is just the location of the face in the image, the bounding box and landmark points. This data is then used to crop and align the detected face in the original image, which is then sent to another model, arcface, for recognition. And arcface always downsizes all face images to the resolution of 128x128 pixels, before producing an embedding.

cubiq commented 9 months ago

I'll do some testing, thanks for the heads up

cubiq commented 9 months ago

okay, white padding seems to be quite effective even though not 100% fail safe. Using a lower detect res seems to impact image quality (or at least the result) quite a bit

with scaling:

scaled

With padding:

no_scale

IntendedConsequence commented 9 months ago

Since the only difference is hair color, I can assume that padding introduces more room for crop logic in https://github.com/deepinsight/insightface/blob/01a34cd94f7b0f4a3f6c84ce4b988668ad7be329/python-package/insightface/model_zoo/arcface_onnx.py#L66 aimg = face_align.norm_crop(img, landmark=kps, image_size=self.input_size[0])

If the image is padded, the crop is slightly bigger, which on the one hand slightly reduces the resolution of the face relative to the image size, but on the other hand it may include more hair which may result in hair color being more prominent in the arcface embedding.

This is the whole logic of it https://github.com/deepinsight/insightface/blob/01a34cd94f7b0f4a3f6c84ce4b988668ad7be329/python-package/insightface/utils/face_align.py#L27:

arcface_dst = np.array(
    [[38.2946, 51.6963], [73.5318, 51.5014], [56.0252, 71.7366],
     [41.5493, 92.3655], [70.7299, 92.2041]],
    dtype=np.float32)

def estimate_norm(lmk, image_size=112,mode='arcface'):
    assert lmk.shape == (5, 2)
    assert image_size%112==0 or image_size%128==0
    if image_size%112==0:
        ratio = float(image_size)/112.0
        diff_x = 0
    else:
        ratio = float(image_size)/128.0
        diff_x = 8.0*ratio
    dst = arcface_dst * ratio
    dst[:,0] += diff_x
    tform = trans.SimilarityTransform()
    tform.estimate(lmk, dst)
    M = tform.params[0:2, :]
    return M

def norm_crop(img, landmark, image_size=112, mode='arcface'):
    M = estimate_norm(landmark, image_size, mode)
    warped = cv2.warpAffine(img, M, (image_size, image_size), borderValue=0.0)
    return warped

The optimal solution would probably be to detect face at any cost so to speak, with gradual lowering of the detection size, but then allow growing the detected bounding box by some percentage, and give the user control of how close the crop they want - do they wish to sacrifice a bit of facial detail by including the hair color, or vice versa. ComfyUI Impact Pack has a very convenient crop factor widget that lets you control exactly that - how close you want to crop the bounding box, with 1.0 being as is, and increasing the factor would expand the crop area. It is more challenging though, since the estimate_norm returns the matrix that crops and aligns (rotates) the face, so introducing a scaled rotate+crop while keeping it centered becomes trickier.

So it is not as simple as padding the input image. It does offer an advantage of a reasonable default that should almost always work, for photos of any size, not just face crops, and it doesn't break the creativity flow by erroring out and requiring the user to fiddle with padding nodes.

cubiq commented 9 months ago

The embeds change quite a bit and the likeliness seems to lower at lower resolution (it's not "just the hair"). Not sure why. Maybe internally insightface uses a low quality interpolation?

The perfect strategy would be if you sent a high res image with the full body (or half-bust) and you started detecting tentatively from a super closeup slowly "zooming out". I believe this is a task for a dedicated node not IPAdapter itself honestly.

Personally I prefer to get an error and work with a better source. Otherwise I wouldn't know what was the cause of the low quality generation. A solution would be to give the user the option to trigger an aggressive detection strategy but again, if you start from a super close up it will just be a hack

cubiq commented 9 months ago

okay I applied the following strategy:

From my preliminary testing if the source is good quality 448x448 seems to be the sweet spot. If you add padding to an image that failed at 640x640 it will be detected at 512 instead of 448 (with very little quality difference). I had only a few detected at lower than 448.

Ideally this would be an option for the user. Maybe I'll make a node for that in the future.

This will appear in the next commit! Thanks for the insight!