the size of datasets of video_encoder

Orange-OpenSource / Cool-Chic

Low-complexity neural image & video codec.

https://orange-opensource.github.io/Cool-Chic/

BSD 3-Clause "New" or "Revised" License

108 stars 6 forks source link

the size of datasets of video_encoder #7

Closed smileto1 closed 7 months ago

smileto1 commented 7 months ago

Hello, when using the CIFAR dataset with 1920*1080 resolution YUV 4:2:0 video data and encoding it with the provided script for training, the nn_bpp remains zero, causing the loss function to stay constant and preventing convergence.

smileto1 commented 7 months ago

Sir.This picture is the printed result of the predicted frame in the code. This is a screenshot of the training process using YUV420 video with 1920*1080 resolution. The value part of this content is all 1. This causes an MSE calculation error and causes the loss to fail to converge. Sorry to bother you, but I can't reproduce the results in your original article.

theoladune commented 7 months ago

Hello,

nn_bpp refers to the rate associated to the different neural networks. As explained in the COOL-CHIC paper, it is not optimized by the training process, unlike the rate associated to the latent variable rate_latent_bpp, so that should not be an issue.

Please join the sequence you'd like to encode and the command line you're using, I'll try it on my side to see what's wrong.

Théo

smileto1 commented 7 months ago

Thank you for your answer，This problem has been solved, sorry to disturb you during the break.