gaohan-12 / SPME

Structure-Preserving Motion Estimation for Learned Video Compression
13 stars 1 forks source link

train code #1

Open xuezhongcailian opened 2 years ago

xuezhongcailian commented 2 years ago

Hello, author, thank you for your work and code, can you share the training code

KippQin commented 2 years ago

Dear author,

Hello, thank you very much for the great contribution you share. I have a question here that I want to ask you.

On the encoder side, the encoder can access the real frame Ft-1 as an auxiliary, but at the decoding side, it can only access the reconstructed frame Ft-1 as a reference. How to access the real frame Ft-1? (Only the code stream is received by the decoder, and only the reconstructed reference frame Ft-1, motion vector and residual can be obtained from the code stream, but the real frame Ft-1 cannot be obtained.)

gaohan-12 commented 2 years ago

Hello, author, thank you for your work and code, can you share the training code

Hi, thanks for your attention to our work. We'll release the training code soon.

gaohan-12 commented 2 years ago

Dear author, Hello, thank you very much for the great contribution you share. I have a question here that I want to ask you. On the encoder side, the encoder can access the real frame Ft-1 as an auxiliary, but at the decoding side, it can only access the reconstructed frame Ft-1 as a reference. How to access the real frame Ft-1? (Only the code stream is received by the decoder, and only the reconstructed reference frame Ft-1, motion vector and residual can be obtained from the code stream, but the real frame Ft-1 cannot be obtained.)

Hello, author, thank you for your work and code, can you share the training code

Hi, Thanks for your meaningful question and attention to our work. As we claimed in the Appendix, the real frame Ft-1 is not available at the decoder side. So many of us would think there is a non-correspondence (or called mismatching) at motion compensation stage. First, this “reference mismatching” exists only at the encoder side, since the estimated motion fields are used for motion compensation in both encoder and decoder. The decoded quality is not affected. Second, we will explain that this mismatch between the motion estimation and motion compensation does not degrade the RD performance. (1) Structure information is helpful for CNN coding. We would like to emphasize the difference between the conventional video coding and the deep learning based one. For the conventional coding, the smaller residuals usually lead to a smaller rate, but for CNN based coding, the structure in both motion fields and residuals matters a lot. Our method provides the structure information to the motion estimation and the corresponding residual under the RD optimization framework, thus improving the overall performance. First, let’s consider the extreme case mentioned above. If the pixels in a decoded reference frame are all zero, only intra prediction is needed without the use of motion field. Therefore, consider a valid case that the decoded reference frame is distorted and very blurry. When performing motion estimation, there exist many motion vectors for each pixel to achieve similar residuals, and the best one is affected by the distortion (which can be treated as random). Accordingly, the estimated motion fields are rather random and the resulted residuals are also random to some extent. This would significantly increase the bit rate for CNN coding, lowering the RD performance. By contrast, using the original reference frame, our method provides structured but blurry motion field, which requires much less bits and provides slightly larger residuals but also structured. This can also be efficiently encoded considering the large improvement from the existing compressed video quality enhancement methods. Furthermore, our method also keeps the temporal correlation from the distortion, in turn enhancing the temporal prediction efficiency over the motion fields and reducing the error propagation in the temporal dimension as shown in Fig. 8, improving the overall coding performance. (2) Considering the above example and analysis, the mismatch problem does not severely increase the residuals, but the structured motion fields and corresponding residuals significantly improve the overall coding performance. In addition, with lower decoded reference quality and more missing information, the structure information provided by the original reference frame is becoming more important, leading to a larger improvement. This is validated in Fig. 7, where the improvement at the lower bit-rate end is larger than that at the higher bit-rate end.