Could you check if I'm implementing it correctly?

I will add up-scaling of reference frames as an option and make conversions and encodings slower to increase quality I could train to double the resolution but that's a long way away. To be clear though this tool works phenomenally with the correct source footage as it was trained on studio lighting with static cameras so news readers are what works best but if you search for celebs reading mean tweets you will find some good vids as the lighting is great, the background is blue and the resolution of the teeth is good due to the high quality recording. however they do look down at their laptops a lot so will be best to clip them so the face is looking at the camera at all times.
I recently changed the crop radius to be consistent so when the face is overlapped onto the original video there is no flickering but this can cause issues if the face is moving back and fourth or the camera is too much, in your circumstance of small and large faces appearing alternately n the source footage.. I will implement a scene change detection so we can use different crop radius's for each scene as well as different custom ref frames.
The vid you shared appears to be ok in terms of the face in each frame however I wonder if the fps is the cause of the issue as when it converts to 25fps it maybe doing something it shouldn't so I will investigate that. Looking for a good source vid should take up a longer time than inference so in the case of Taylor I would have used https://www.youtube.com/watch?v=XnbCSboujF4 with the time stamps of 6:47 to 6:51

Inferencer / LipSick

Could you check if I'm implementing it correctly? #22

Inferencer / LipSick

Could you check if I'm implementing it correctly? #22

Could you check if I'm implementing it correctly? #22