I wanted to express my appreciation for your two papers, "Interactive Face Video Coding: A Generative Compression Framework" and "Beyond Keypoint Coding: Temporal Evolution Inference with Compact Feature Representation for Talking Face Video Compression". Both works are excellent, and the results are impressive.
I am interested in learning more about the rate-distortion results. Can you please provide further details about how to calculate the number of bits applied to the x-axis? For example, VVC setting, FOMM, and face-vid2vid use the number of bits for each frame, the frame-per-second (FPS) setting in the test, etc. I have tested the same test clips using H.264, H.265, and VVC, and the results show better VVC R-D performance than what was presented in the paper. Additionally, I noticed a small mistake in the paper. The clips presented in Figure 6 seem to be from VoxCeleb1, not VoxCeleb2.
Sorry, I don't know what is the meaning of x-axis? Could you give detailed expression. About the implementation details, you can download the supplementary materials where we have provided the details about QP settings. Regarding the VVC, we use the VTM10.0 LDB mode to compression RGB444 at the resolution of 256*256 with 250 frames. In addition, for FOMM and Face_vid2vid, we just compression the first frame with the VVC codec, and other inter frames are characterized into facial parameters and execute the inter-prediction and quantization operations.
As for your second question about the VVC RD performance, I would like to claim that our proposed systems are designed for ultra-low bit rate communication scenarios. As such, the compared VVC anchor is compressed with large QP. I don't know what the QPs are. Could you please provide further information.
Thank you for your kind reminder. Later we will revise this error.
I wanted to express my appreciation for your two papers, "Interactive Face Video Coding: A Generative Compression Framework" and "Beyond Keypoint Coding: Temporal Evolution Inference with Compact Feature Representation for Talking Face Video Compression". Both works are excellent, and the results are impressive.
I am interested in learning more about the rate-distortion results. Can you please provide further details about how to calculate the number of bits applied to the x-axis? For example, VVC setting, FOMM, and face-vid2vid use the number of bits for each frame, the frame-per-second (FPS) setting in the test, etc. I have tested the same test clips using H.264, H.265, and VVC, and the results show better VVC R-D performance than what was presented in the paper. Additionally, I noticed a small mistake in the paper. The clips presented in Figure 6 seem to be from VoxCeleb1, not VoxCeleb2.