How to get the inference result of apple52 blendshapes from single video file?

LizhenWangT / FaceVerse

FaceVerse: a Fine-grained and Detail-controllable 3D Face Morphable Model from a Hybrid Dataset (CVPR2022)

BSD 2-Clause "Simplified" License

460 stars 57 forks source link

How to get the inference result of apple52 blendshapes from single video file? #9

Closed lucasjinreal closed 2 years ago

lucasjinreal commented 2 years ago

Hi, how to get the inference result of apple52 blendshapes? I saw the code only have a optimization based function, this could be every time-consuming. I though it could a be single model outputs all blendshapes and rotation in a single forward.

LizhenWangT commented 2 years ago

The exp_tensor in model/FaceVerse.py is the 52 expression-related parameters (blendshapes) with the corresponding name in model_dict['exp_name_list']. I think it's quite difficult to get stable results of pose and expression parameters in a single forward. We have demonstrated that the optimization can be accelerated to real-time using CUDA-based point rendering, as shown in Fig.4. But I'm so sorry, we are not going to release this version for the time being.

lucasjinreal commented 2 years ago

@LizhenWangT How does optimize based method able to realtime? How many iteration it need? Did u test the time of optimization? the rendering part is OK, that part time isn't the biggest one I think.

LizhenWangT commented 2 years ago

Actually, the biggest time-consuming process is surface-based differentiable rendering in this version. Because we need to render an image in each "fitting with differentiable rendering" step and the backward propagation is also quite time-consuming. We need to render more than 10 images for the optimization of each frame. In the accelerated version, we use many multi-process and multi-batch optimzation. We use 15 iterations for "fitting only using landmarks" and 10 iterations for "fitting with differentiable rendering" (which is changed to CUDA-based point rendering rather than pytorch3D-based surface rendering), which results in 0.06s when batchsize=1 and 0.033s in average when batchsize=3. Since the biggest part time is the optimzation, other steps can be handled in other threads. We finnaly get a 0,033s per frame with about a 0.2s delay tracking algorithm. By the way, we use Jittor rather than PyTorh in the accelerated version.

lucasjinreal commented 2 years ago

I see.... I am searching for a learning based method though. We have a single forward version prediction 52 blendshapses within 0.005s, but I need something better, optimization based is not my direction.

enjoybo commented 2 years ago

I see.... I am searching for a learning based method though. We have a single forward version prediction 52 blendshapses within 0.005s, but I need something better, optimization based is not my direction.

Hello, is your single forward model based on Faceverse basis?

MarcoG5 commented 2 years ago

I see.... I am searching for a learning based method though. We have a single forward version prediction 52 blendshapses within 0.005s, but I need something better, optimization based is not my direction.

Hello, is there any learning based method for blendshape prediction that you can share? I am interested in those kind of methods.

enjoybo commented 2 years ago

这是来自QQ邮箱的自动回复邮件。邮件已收到，谢谢!