Sygil-Dev / sygil-webui

Stable Diffusion web UI
GNU Affero General Public License v3.0
7.88k stars 883 forks source link

Support for CodeFormer instead of GFPGAN #35

Closed green-s closed 2 years ago

green-s commented 2 years ago

In my experience CodeFormer can work a fair bit better than GFPGAN (aside from the lack of smooth blending with the background). May be worth implementing as an option.

hlky commented 2 years ago

aside from the lack of smooth blending with the background

this appears to be separate and could be implemented without

will look into it

could you perhaps test dropping in the codeformer model as a direct replacement for src\gfpgan\experiments\pretrained_models\GFPGANv1.3.pth ? i doubt it will work but worth testing

green-s commented 2 years ago

Didn't work.

RuntimeError: Error(s) in loading state_dict for GFPGANv1Clean:
        Missing key(s) in state_dict:

and

Unexpected key(s) in state_dict:

both followed by dozens of keys.

hlky commented 2 years ago

notes for implementation:

what did they mean by this

parser.add_argument('--has_aligned', action='store_true', help='Input are cropped and aligned faces')
    net = ARCH_REGISTRY.get('CodeFormer')(dim_embd=512, codebook_size=1024, n_head=8, n_layers=9, 
                                            connect_list=['32', '64', '128', '256']).to(device)

    # ckpt_path = 'weights/CodeFormer/codeformer.pth'
    ckpt_path = load_file_from_url(url=pretrain_model_url['restoration'], 
                                    model_dir='weights/CodeFormer', progress=True, file_name=None)
    checkpoint = torch.load(ckpt_path)['params_ema']
    net.load_state_dict(checkpoint)
    net.eval()

    # ------------------ set up FaceRestoreHelper -------------------
    # large det_model: 'YOLOv5l', 'retinaface_resnet50'
    # small det_model: 'YOLOv5n', 'retinaface_mobile0.25'
    if not args.has_aligned: 
        print(f'Using [{args.detection_model}] for face detection network.')
    face_helper = FaceRestoreHelper(
        args.upscale,
        face_size=512,
        crop_ratio=(1, 1),
        det_model = args.detection_model,
        save_ext='png',
        use_parse=True,
        device=device)
            face_helper.read_image(img)
            # get face landmarks for each face
            num_det_faces = face_helper.get_face_landmarks_5(
                only_center_face=args.only_center_face, resize=640, eye_dist_threshold=5)
            print(f'\tdetect {num_det_faces} faces')
            # align and warp each face
            face_helper.align_warp_face()
for idx, cropped_face in enumerate(face_helper.cropped_faces):
            # prepare data
            cropped_face_t = img2tensor(cropped_face / 255., bgr2rgb=True, float32=True)
            normalize(cropped_face_t, (0.5, 0.5, 0.5), (0.5, 0.5, 0.5), inplace=True)
            cropped_face_t = cropped_face_t.unsqueeze(0).to(device)

            try:
                with torch.no_grad():
                    output = net(cropped_face_t, w=w, adain=True)[0]
                    restored_face = tensor2img(output, rgb2bgr=True, min_max=(-1, 1))
                del output
                torch.cuda.empty_cache()
            except Exception as error:
                print(f'\tFailed inference for CodeFormer: {error}')
                restored_face = tensor2img(cropped_face_t, rgb2bgr=True, min_max=(-1, 1))

            restored_face = restored_face.astype('uint8')
            face_helper.add_restored_face(restored_face)

ignore background sampler stuff

        if not args.has_aligned:
            # upsample the background
            if bg_upsampler is not None:
                # Now only support RealESRGAN for upsampling background
                bg_img = bg_upsampler.enhance(img, outscale=args.upscale)[0]
            else:
                bg_img = None
            face_helper.get_inverse_affine(None)
            # paste each restored face to the input image
            restored_img = face_helper.paste_faces_to_input_image(upsample_img=bg_img, draw_box=args.draw_box)
HenkDz commented 2 years ago

aside from the lack of smooth blending with the background

this appears to be separate and could be implemented without

will look into it

could you perhaps test dropping in the codeformer model as a direct replacement for src\gfpgan\experiments\pretrained_models\GFPGANv1.3.pth ? i doubt it will work but worth testing

Just tested it, restarted the webui. It didn't work but strangely I got the same result of the normal GFPGAN

image

hlky commented 2 years ago

https://github.com/hlky/stable-diffusion-webui/wiki/Upscalers