KUIS-AI-Tekalp-Research-Group / video-compression

Research on Video Compression
21 stars 1 forks source link

help request #1

Closed KippQin closed 2 years ago

KippQin commented 2 years ago

Dear author,

First of all, I'm very sorry to disturb you suddenly. I am honored to read your paper (i.e. LHBDC) in TIP journal and admire your great contribution in video compression. In your paper, I found a GitHub link (https://github.com/makinyilmaz/LHBDC/) to the source code implementing the LHBDC method. I want to use it for comparative experiments. However, the following errors occurred while running the code, as shown Fig.1 in attach file.

Therefore, we consider that there may be a problem when loading the pre-trained model, and then modify a piece of code in the original encode_B.py and decode_B.py files, that is, modify 'model.load_statedict(torch.load(f"pretrained/compression{args.l}.pth", map_location=lambda storage, loc: storage)["state_dict"])' to 'model.load_statedict(torch.load(f"pretrained/compression{args.l}.pth", map_location=lambda storage, loc: storage)["state_dict"], False)'

Then we ran the code again and found that the code runs without error. But when the bit_B.bin file output by the encoder is used for decoding, the output by the decoder reconstructed image is wrong, as shown in Fig.2 in attach file.

So, I would like to ask you if you have had errors like the above when running your code in the past.If so, how did you solve them? I really look forward to your answers to my doubts and replies, and I would greatly appreciate your help. Attach file.docx

Thanks again.

makinyilmaz commented 2 years ago

Thanks for your kind comments. I think the problem is caused by an older version of compressai library which automatically adopts clamping between 0, 1 after calling decompress method. We put a full test script and the environment used in our experiments. Could you please try now and let us know if you still experience an error? I would like to note that default B model is lambda 1626 and I model is set to quality 7. If you want to try other lambda value models found in drive link, please adjust I model quality such that: set it to 6 for lambda 845, 8 for lambda 3141 etc.

Please ket me know your results.

Best

KippQin commented 2 years ago

Dear Yılmaz​,

I am very happy to receive your reply email. Thank you very much for your help. I will follow the test environment you provided and try running the encoder and decoder again. If I encounter related errors, I will feedback them to you.​​

Best Regards

makinyilmaz commented 2 years ago

I recently found a typo in decode_b.py script and updated encode and decode functions. In addition, I fixed clamping issue after decoding operations. You may see the difference in model/layers.py script. I think you should now be able to produce valid results. One last thing: please re-download pretrained models since I updated them for newer compressai version.

Best

makinyilmaz commented 2 years ago

With the updated code, I suppose now you are able to produce valid results. I am closing the issue for now. If you encounter some other problem, please feel free to open another issue.

Best

KippQin commented 2 years ago

Dear Yılmaz​,

I am very happy to receive your reply. Thank you very much for your help. I followed the test environment you provided and try running the testing.py again. I found that for the same sequence and under the same configuration, the output bit size and PSNR are different after each run of testing.py. At the same time, the compressed bits (such as 206433bits for a B-frame in 1626) of the B frames are very large and even exceed the bits of the I frames.

Best Regards

KippQin commented 2 years ago

Dear Yılmaz​,

Sorry for my sudden interruption. Can you tell me how FlowNet loads the weights? Do I need to download SPyNet's pretrained model when running testing.py, encoder.py and decoder.py?

If I use the following statement it will throw errors in FlowNet: b_model.load_state_dict(torch.load(args.b_pretrained, map_location=lambda storage, loc: storage)["state_dict"])

Therefore, I can only use statements like: b_model.load_state_dict(torch.load(args.b_pretrained, map_location=lambda storage, loc: storage)["state_dict"], False)

But does doing so have any impact on encoding performance?

Best Regards

makinyilmaz commented 2 years ago

Dear KippQin,

Thank you for your comments. I re-download and run the encode-decode scripts and observed 12736 bytes when lambda is set to 1626. The size remains unchanged as a run the scripts multiple times. In addition, I also run the testing.py script on beauty sequence 2 times. Below you may see the results.

INFO:root:Current video: beauty INFO:root:Shape: 1080x1920 INFO:root: INFO:root:----- Per video, level pair ----- INFO:root:Video: beauty, Level: 7 PSNR: 35.2488967481879 INFO:root:Video: beauty, Level: 7 bpp: 0.28364649766032474 INFO:root: INFO:root:----- Per video, level, frame type pair ----- INFO:root:Video: beauty, Frame: B, Level: 7 PSNR: 35.06443562836365 INFO:root:Video: beauty, Frame: B, Level: 7 bpp: 0.23144761391900687 INFO:root:Video: beauty, Frame: I, Level: 7 PSNR: 36.52290821577412 INFO:root:Video: beauty, Frame: I, Level: 7 bpp: 0.6441667880336934 INFO:root:*****

INFO:root:Current video: beauty INFO:root:Shape: 1080x1920 INFO:root: INFO:root:----- Per video, level pair ----- INFO:root:Video: beauty, Level: 7 PSNR: 35.248898268715116 INFO:root:Video: beauty, Level: 7 bpp: 0.2836471825046264 INFO:root: INFO:root:----- Per video, level, frame type pair ----- INFO:root:Video: beauty, Frame: B, Level: 7 PSNR: 35.0644373147483 INFO:root:Video: beauty, Frame: B, Level: 7 bpp: 0.23144839629106864 INFO:root:Video: beauty, Frame: I, Level: 7 PSNR: 36.52290859077925 INFO:root:Video: beauty, Frame: I, Level: 7 bpp: 0.6441667992862654 INFO:root:*****

As you notice there is a negligible difference in PSNR and bpp values that looks like precision error.

Regarding the FlowNet module, you do not need to download weights for the SpyNet since we fine-tuned it and the checkpoints already contains its weights.

Could you please re-download the repo and pre-trained weights from the drive link, and re-try running the experiment. Lets keep in touch!

Best

KippQin commented 2 years ago

Dear Yılmaz​,

There is one more question I need to ask you here. Does your model have constraints on test image size? For example, the size is a multiple of 16 or a multiple of 64.

Best

makinyilmaz commented 2 years ago

Yes the size of input image is a multiple of 64. This issue is solved by applying reflection padding. You may check utils.py script. Also, in encode.py there is also a _processframe function that pads the input. So you can pass inputs with any resolution.

Best