chaoyuaw / pytorch-vcii

Video Compression through Image Interpolation (ECCV'18) [PyTorch]
https://chaoyuaw.github.io/vcii/
GNU Lesser General Public License v2.1
209 stars 38 forks source link

Puzzles refer to the training part. #11

Closed wensihan closed 5 years ago

wensihan commented 5 years ago

hello, chaoyuan:

I am confused of the training steps. Do you mean that compress the I-frames every 12 frames using the image compression, then, using the video compression with hier from 0 to 2 step by step to get all the reference frames? And, does it mean that we need four models to get the whole video images?

And I found that if I set the v_compress as False, the code will not go through. Can this code be used to compress the images?

Looking forward to your reply. Thank you~

chaoyuaw commented 5 years ago

Hi @wensihan , thanks for your questions.

Yes, there are 4 models in total, a I-frame model and three interpolation models (each for a level in the hierarchy). At training time, we train interpolation models using ground-truth images to interpolate. At evaluation time, the algorithm must follow the order, and we used the frames reconstructed from the previous level to interpolate frames at a certain level.

I think the I-frame model should work, but I haven't tested it thoroughly for this publicly released version. Maybe some small modifications are needed, but all the main component (data loader, model architecture, optimizer, etc. ) should be there.

wensihan commented 5 years ago

Chaoyuan,

Thank you very much for your reply. And now I can understand your work clearly, really appreciate for your sharing~

Wen

wensihan commented 5 years ago

Hello, chaoyuan:

Sorry to trouble you again.

I have a question about the network. When you set the v_compress and stack True. The input of the encoder will be [frame1, res, frame2], does it mean that we need to compress the current frame and the other two referenced frame? I think the res should be the residual, but it seems it stands for the current inter frame. If so, the inter can also be seen as image compression. I am some confused.

Wen

chaoyuaw commented 5 years ago

Hi Wen, thanks for your question!

I'm not entirely sure if I understand your question correctly. frame1 and frame2 are de-compressed RGB frames. At decoding time, it comes from the previous level of "hierarchy". At training time, we use the original "lossless" RGB images for frame1 and frame2 for simplicity.