Sxjdwang / TalkLip

405 stars 36 forks source link

The results of the generation are not aligned. Why do you need to adjust the bbx when post-processing the fused face? #34

Closed zhenyingfang closed 1 year ago

zhenyingfang commented 1 year ago

In line 178 of the file utils/data_avhubert.py, adjusting bbx causes the results to be misaligned.

Sxjdwang commented 1 year ago

Since the sizes of faces vary, and our model only takes images of 96x96 as inputs, we crop faces and resize them to 96x96 before feeding them to the model. Therefore, after synthesis, we need to resize the outputs back to the original sizes of faces.

zhenyingfang commented 1 year ago

Since the sizes of faces vary, and our model only takes images of 96x96 as inputs, we crop faces and resize them to 96x96 before feeding them to the model. Therefore, after synthesis, we need to resize the outputs back to the original sizes of faces.

I understand that the resize operation is required, but if the height of the bbox is greater than the weight of img, Line 178: if bbx[3] > width: bbx[3] = width will change the size of the original bbox, resulting in the inability to align with the original face when pasted back to the original image. I think Line 178 should be changed to if bbx[3] > height: bbx[3] = height, or comment out the line change. Tested on my own data is such a result.

Sxjdwang commented 1 year ago

You are right, width and height are not necessarily the same. I appreciate your comments and have changed the corresponding code.