Open timmyhk852 opened 3 weeks ago
this PR I drafted some time back - the plan was to ignore the bottom half of face in the mask if you click Face Mask Correction to see difference. https://github.com/Gourieff/sd-webui-reactor/pull/292
this PR I drafted some time back - the plan was to ignore the bottom half of face in the mask if you click Face Mask Correction to see difference. #292
same after ticking Face Mask Correction,
@timmyhk852 - I tested with tongue out - and it failed. I have had results with mouth open - though it's not good enough. was looking at this again today - maybe can use mediapipe to create the mask and cut the bottom of the mask to the top lips. but results will probably still disappoint. I'm wondering if the insightface / onnx / mapper backing model is simply inadequate or it needs another model to help here. The reactor / roop stuff works fantastic but it fails miserably in this use case. I'm wondering if the inswapper_128.onnx could be translated back to pytorch - and the result of faceswap could be somehow passed back into pipeline.like make the onnx model become a lora of sorts operating in the latent space.
for the simpler cosmetic approach - @Gourieff -did you use mediapipe not sure how to articulate this - but I'm not clear on how I could pass mediapipe coordinates to create a different mask
# Process the image to detect face landmarks.
self.mp_face_mesh = mp.solutions.face_mesh
results = self.mp_face_mesh.process(image_rgb)
img_h, img_w, _ = image.shape
face_3d = []
face_2d = []
if results.multi_face_landmarks:
for face_landmarks in results.multi_face_landmarks:
https://github.com/johndpope/Emote-hack/blob/main/Net.py#L941
apply_face_mask_with_exclusion(swapped_image=swapped_image,target_image=result,target_face=target_face,entire_mask_image=entire_mask_image,MEDIA_PIPE_LANDMARK_MASK_WITH_HEAD_CUT_TO_TOP_LIPS)
this is advance detection of lips from a project I was reviewing the other week https://github.com/Zejun-Yang/AniPortrait/blob/cb86caa741d6ab1e119ea7ac2554eb28aabc631b/src/utils/face_landmark.py#L133
it's possible I could have this contained and wired up to just do this augmentation of mask
I'm wondering if the inswapper_128.onnx could be translated back to pytorch - and the result of faceswap could be somehow passed back into pipeline.like make the onnx model become a lora of sorts operating in the latent space.
I've been thinking about this as well... We need to make a "reverse engineering" of the inswapper model to improve it and make a new model with 256 or 512 target-input (it would be great for Community to have really free-licensed model with HQ output) and maybe with an additional masking input or as you suggested in the way as it could be a Lora
About masking of parts... There is smth like this in Facefusion, I've not tested it with tongues, but it works with lips and teeth So I have in plans to implement such segmenting for ReActor in future updates, just need to find free time for this
https://github.com/johndpope/Emote-hack/blob/main/Net.py#L941
apply_face_mask_with_exclusion(swapped_image=swapped_image,target_image=result,target_face=target_face,entire_mask_image=entire_mask_image,MEDIA_PIPE_LANDMARK_MASK_WITH_HEAD_CUT_TO_TOP_LIPS)
this is advance detection of lips from a project I was reviewing the other week https://github.com/Zejun-Yang/AniPortrait/blob/cb86caa741d6ab1e119ea7ac2554eb28aabc631b/src/utils/face_landmark.py#L133
it's possible I could have this contained and wired up to just do this augmentation of mask
Hm... Rather interesting... 🧐
somewhat related - https://github.com/AtlantixJJ/PVA-CelebAHQ-IDI
had a play with ConsistentID - IT WORKS!!!! after some faffing around - https://github.com/JackAILab/ConsistentID/issues/18
had a play with ConsistentID - IT WORKS!!!! after some faffing around - JackAILab/ConsistentID#18
I dont understand...so is it possible for the swapped face to tongue out now?
had a play with ConsistentID - IT WORKS!!!! after some faffing around - JackAILab/ConsistentID#18
Nice! I'll take a look next week Maybe we can combine your PR with this feature, it would be super-good
the consistenid works by introducing a new stablediffusion pipeline https://github.com/JackAILab/ConsistentID/blob/main/infer.py need to review other automatic1111 plugins to get my head around this flow. @Gourieff - does any plugin come to mind?
for my needs - just plugging in to infer.py is fine - just select the SD model - and you can add loras.
# TODO import base SD model and pretrained ConsistentID model
device = "cuda"
base_model_path = "SG161222/Realistic_Vision_V6.0_B1_noVAE"
consistentID_path = "./ConsistentID_model_facemask_pretrain_50w.bin" # pretrained ConsistentID model
# "philz1337/epicrealism" #
# Gets the absolute path of the current script
script_directory = os.path.dirname(os.path.realpath(__file__))
### Load base model
pipe = ConsistentIDStableDiffusionPipeline.from_pretrained(
base_model_path,
torch_dtype=torch.float16,
use_safetensors=False
).to(device)
i had initially used Marlyn Monroe - and the results were quite good but now jury is out - im using different loras and faces and the results are a bit off - they have plans to increase the input images to an array of faces. @timmyhk852 - basically the model faceinsight can't handle tongues / mouth opens we need to explore some "photoshopping" cut and paste work with masks - try saving the originally - and then have to merge the two images.
Bumping detection threshold up over 0.86 has hit and miss after the 3rd or 4th generation in a batch. Mostly it loses the mask when it does work. More consistency over 0.90 but by then there is no mask. Maybe there's something to be tweaked in that?
Feature description
The tongue cannot be generated by using this extension. So the person cannot tongue out.