Open z1296 opened 2 years ago
I am writing the training code too. Waiting for the reply of the authors too. Thank you.
I am writing the training code too. Waiting for the reply of the authors too. Thank you.
Nice to see your comment. Did your training process go well?
I am under training. And I still did not check whether it goes well. I can share with you when I have updated news. Thank you.
I am under training. And I still did not check whether it goes well. I can share with you when I have updated news. Thank you.
Best wishes. Hope you get a good result.
Hi, Thanks for your message. Do your check whether the whole optical motion estimation, MV encoding and decoding parts are fixed during this step? Our latest work also uses the similar progressive training stragegy, as shown in the appendix at https://arxiv.org/abs/2111.13850
Hi, Thanks for your message. Do your check whether the whole optical motion estimation, MV encoding and decoding parts are fixed during this step? Our latest work also uses the similar progressive training stragegy, as shown in the appendix at https://arxiv.org/abs/2111.13850
Thank you for your reply. I have already read your latest work, it is extraordinary. And the training process is described in great detail. In fact, I have trained step by step as described in your latest work. Following your advice, I wrote a simple program that checked whether the whole optical motion estimation, MV encoding, and decoding parts were frozen during the 6 epochs "Recon" training stage. The program and the list of frozen parameters are below. code:
dict_trained_ori3 = torch.load("dcvc_lm256_ckpt_mv_0_3.pth.tar", map_location=torch.device('cpu'))['state_dict'] dict_trained_ori9 = torch.load("dcvc_lm256_ckpt_remain_4_9.pth.tar",map_location=torch.device('cpu'))['state_dict'] finename = "DCVC_Opticflow_MV_Para.txt" f = open(finename) para_name = f.readline() while para_name: para_name=para_name.replace("\n", "") para_name=para_name.replace("\r","") for k in dict_trained_ori9: if k.startswith(para_name): kp9 = dict_trained_ori9[k] for k in dict_trained_ori3: if k.startswith(para_name): kp3 = dict_trained_ori3[k] if not kp3.equal(kp9): print(para_name) para_name = f.readline() f.close() assert (dict_trained_ori3["auto_regressive_mv.weight"]*dict_trained_ori3["auto_regressive_mv.mask"])\ .equal(dict_trained_ori9["auto_regressive_mv.weight"]*dict_trained_ori9["auto_regressive_mv.mask"])
DCVC_Opticflow_MV_Para.txt From this procedure I think I can confirm that, the whole optical motion estimation, MV encoding and decoding parts are fixed during this ”Recon“ step, as long as there are no omissions in my list of parameters that need to be frozen. It's strange that the bpp of y doesn't converge. Looking forward to your suggestions and replies. Thanks again.
- Have you tried several times and each training meets this problem?
- I think you can also load the weights from my checkpoint. If this problem still happens, Training Step 3 may be needed to further check. Otherwise, Training Steps 1 and 2 need to be checked.
Thanks a lot for your reply and suggestions.
- Have you tried several times and each training meets this problem?
- I think you can also load the weights from my checkpoint. If this problem still happens, Training Step 3 may be needed to further check. Otherwise, Training Steps 1 and 2 need to be checked.
Thanks a lot for your reply and suggestions.
- Yes.
- I have followed your suggestion and did the step 3 training process based on the weights of your checkpoint. You are right. The same problem reappeared. The bpp of y continues to rise rapidly as before whether the learning rate is 1e-4 or 1e-5. I have been checking the code of step 3. But no reason has yet been found. The loss is rd_loss = 256 * distortion + distribution_loss. And the distortion is torch.mean((recon_image - input_image).pow(2)). The distribution_loss is bpp_y+bpp_z. The AdamW optimizer is used. There might be a bug somewhere but I overlooked it.
Hello, You close this problem, did you solve your problem? My training process also goes wrong. :(
Hello, You close this problem, did you solve your problem? My training process also goes wrong. :(
No. :( The author provides an idea of checking: train at his public checkpoints. I might have to spend some time experimenting, so I closed the question. Now it looks like there's more discussion to be had, so I reopen it now. I think maybe we could share the test results after each stage of training to see where the problem is.
Hello, You close this problem, did you solve your problem? My training process also goes wrong. :(
No. :( The author provides an idea of checking: train at his public checkpoints. I might have to spend some time experimenting, so I closed the question. Now it looks like there's more discussion to be had, so I reopen it now. I think maybe we could share the test results after each stage of training to see where the problem is.
Sure, we can have more discussions as I wished. I noticed the figure that you showed above and I thought the training iterations may not be enough as there are only about 5k iterations in your one training stage. And I think the cascaded loss in https://arxiv.org/abs/2111.13850 to alleviates error propogation can be neglected first which is come from "Lu, G., Cai, C., Zhang, X., Chen, L., Ouyang, W., Xu, D., & Gao, Z. (2020, August). Content adaptive and error propagation aware deep video compression. In European Conference on Computer Vision (pp. 456-472). Springer, Cham". I also need some time to check my problem. If you don't mind, we could have a further discussion on zoom in the future.
Sure, we can have more discussions as I wished. I noticed the figure that you showed above and I thought the training iterations may not be enough as there are only about 5k iterations in your one training stage. And I think the cascaded loss in https://arxiv.org/abs/2111.13850 to alleviates error propogation can be neglected first which is come from "Lu, G., Cai, C., Zhang, X., Chen, L., Ouyang, W., Xu, D., & Gao, Z. (2020, August). Content adaptive and error propagation aware deep video compression. In European Conference on Computer Vision (pp. 456-472). Springer, Cham". I also need some time to check my problem. If you don't mind, we could have a further discussion on zoom in the future.
Oh yes. I have set the batch size to 8. I will change it to 4. Of course, I can use zoom or Tencent meeting or Skype anytime. The problem with my training comes before the cascade training, which is at the "Single" stage. I trained with the rate-distortion function on the quality 0 checkpoints published by the author, and the bpp of mv_y showed a rapid rise after a while during training, whether the learning rate is 1e-4 or 1e-5. The loss is rd_loss = 256 * distortion + distribution_loss. And the distortion is torch.mean((recon_image - input_image).pow(2)). The distribution_loss is bpp = bpp_y + bpp_z + bpp_mv_y + bpp_mv_z. I can give my training code here. It's so short that I don't know where the problem is.
def clip_gradient(optimizer, grad_clip): for group in optimizer.param_groups: for param in group["params"]: if param.grad is not None: param.grad.data.clamp_(-grad_clip, grad_clip) def train_0_15(model_dcvc, model_cheng, train_dataloader, optimizer, epoch, args): model_cheng.eval() model_dcvc.train() device = next(model_dcvc.parameters()).device for i, batch in enumerate(train_dataloader): d = [frames.to(device) for frames in batch]#2 frames: d[0],d[1] with torch.no_grad(): output_i = model_cheng(d[0]) d[0] = output_i["x_hat"] output_p = model_dcvc(d, epoch) distribution_loss = output_p["bpp_loss"][0] distortion = output_p["mse_loss"][0] distribution_loss_y = output_p["bpp_y"][0] distribution_loss_z = output_p["bpp_z"][0] distribution_loss_mv_y = output_p["bpp_mv_y"][0] distribution_loss_mv_z = output_p["bpp_mv_z"][0] rd_loss = args.lmbda * distortion + distribution_loss optimizer.zero_grad() rd_loss.backward() clip_gradient(optimizer, 5) optimizer.step() writer.add_scalar('distribution_loss',distribution_loss,(epoch-10)*16140+i) writer.add_scalar('distribution_loss_y',distribution_loss_y,(epoch-10)*16140+i) writer.add_scalar('distribution_loss_z',distribution_loss_z,(epoch-10)*16140+i) writer.add_scalar('distribution_loss_mv_y',distribution_loss_mv_y,(epoch-10)*16140+i) writer.add_scalar('distribution_loss_mv_z',distribution_loss_mv_z,(epoch-10)*16140+i) def main(argv): args = parse_args(argv) device = "cuda" if torch.cuda.is_available() else "cpu" print(device) if args.seed is not None: torch.manual_seed(args.seed) random.seed(args.seed) net_dcvc = DCVC_net() load_checkpoint = torch.load("model_dcvc_quality_0_psnr.pth") net_dcvc.load_dict(load_checkpoint) net_dcvc = net_dcvc.to(device) net_dcvc.train() net_cheng_checkpoint = torch.load(args.i_frame_model_path) net_cheng = architectures[args.i_frame_model_name].from_state_dict(net_cheng_checkpoint) for para in net_cheng.parameters(): para.requires_grad = False net_cheng = net_cheng.to(device) net_cheng.eval() train_transforms = transforms.Compose( [transforms.ToTensor(), transforms.RandomCrop(args.patch_size)] ) test_transforms = transforms.Compose( [transforms.ToTensor(), transforms.CenterCrop(args.patch_size)] ) train_dataset_2 = VideoFolder( max_frames=2, root = args.dataset, rnd_interval=True, rnd_temp_order=True, split="train", transform=train_transforms, ) train_dataloader_2 = DataLoader( train_dataset_2, batch_size=args.batch_size, num_workers=args.num_workers, shuffle=True, pin_memory=(device == "cuda"), ) for para in net_dcvc.parameters(): para.requires_grad = True optimizer_all_1e_5 = optim.AdamW(net_dcvc.parameters(), lr=1e-5) for epoch in range(10, 16): train_0_15(net_dcvc, net_cheng, train_dataloader_2, optimizer_all_1e_5, epoch, args) name_remain = "dcvc_lm256_ckpt_"+str(epoch)+"_e5_fromowner.pth.tar" save_checkpoint( { "epoch": epoch, "state_dict": net_dcvc.state_dict(), }, name_remain, )
Sure, we can have more discussions as I wished. I noticed the figure that you showed above and I thought the training iterations may not be enough as there are only about 5k iterations in your one training stage. And I think the cascaded loss in https://arxiv.org/abs/2111.13850 to alleviates error propogation can be neglected first which is come from "Lu, G., Cai, C., Zhang, X., Chen, L., Ouyang, W., Xu, D., & Gao, Z. (2020, August). Content adaptive and error propagation aware deep video compression. In European Conference on Computer Vision (pp. 456-472). Springer, Cham". I also need some time to check my problem. If you don't mind, we could have a further discussion on zoom in the future.
Oh yes. I have set the batch size to 8. I will change it to 4. Of course, I can use zoom or Tencent meeting or Skype anytime. The problem with my training comes before the cascade training, which is at the "Single" stage. I trained with the rate-distortion function on the quality 0 checkpoints published by the author, and the bpp of mv_y showed a rapid rise after a while during training, whether the learning rate is 1e-4 or 1e-5. The loss is rd_loss = 256 * distortion + distribution_loss. And the distortion is torch.mean((recon_image - input_image).pow(2)). The distribution_loss is bpp = bpp_y + bpp_z + bpp_mv_y + bpp_mv_z. I can give my training code here. It's so short that I don't know where the problem is.
def clip_gradient(optimizer, grad_clip): for group in optimizer.paramgroups: for param in group["params"]: if param.grad is not None: param.grad.data.clamp(-grad_clip, grad_clip)
def train_0_15(model_dcvc, model_cheng, train_dataloader, optimizer, epoch, args): model_cheng.eval() model_dcvc.train() device = next(model_dcvc.parameters()).device for i, batch in enumerate(train_dataloader): d = [frames.to(device) for frames in batch]#2 frames: d[0],d[1] with torch.no_grad(): output_i = model_cheng(d[0]) d[0] = output_i["x_hat"] output_p = model_dcvc(d, epoch) distribution_loss = output_p["bpp_loss"][0] distortion = output_p["mse_loss"][0] distribution_loss_y = output_p["bpp_y"][0] distribution_loss_z = output_p["bpp_z"][0] distribution_loss_mv_y = output_p["bpp_mv_y"][0] distribution_loss_mv_z = output_p["bpp_mv_z"][0] rd_loss = args.lmbda distortion + distribution_loss optimizer.zero_grad() rd_loss.backward() clip_gradient(optimizer, 5) optimizer.step() writer.add_scalar('distribution_loss',distribution_loss,(epoch-10)16140+i) writer.add_scalar('distribution_loss_y',distribution_loss_y,(epoch-10)16140+i) writer.add_scalar('distribution_loss_z',distribution_loss_z,(epoch-10)16140+i) writer.add_scalar('distribution_loss_mv_y',distribution_loss_mv_y,(epoch-10)16140+i) writer.add_scalar('distribution_loss_mv_z',distribution_loss_mv_z,(epoch-10)16140+i) def main(argv): args = parse_args(argv) device = "cuda" if torch.cuda.is_available() else "cpu" print(device) if args.seed is not None: torch.manual_seed(args.seed) random.seed(args.seed) net_dcvc = DCVC_net() load_checkpoint = torch.load("model_dcvc_quality_0_psnr.pth") net_dcvc.load_dict(load_checkpoint) net_dcvc = net_dcvc.to(device) net_dcvc.train() net_cheng_checkpoint = torch.load(args.i_frame_model_path) net_cheng = architectures[args.i_frame_model_name].from_state_dict(net_cheng_checkpoint) for para in net_cheng.parameters(): para.requires_grad = False net_cheng = net_cheng.to(device) net_cheng.eval() train_transforms = transforms.Compose( [transforms.ToTensor(), transforms.RandomCrop(args.patch_size)] ) test_transforms = transforms.Compose( [transforms.ToTensor(), transforms.CenterCrop(args.patch_size)] ) train_dataset_2 = VideoFolder( max_frames=2, root = args.dataset, rnd_interval=True, rnd_temp_order=True, split="train", transform=train_transforms, ) train_dataloader_2 = DataLoader( train_dataset_2, batch_size=args.batch_size, num_workers=args.num_workers, shuffle=True, pin_memory=(device == "cuda"), ) for para in net_dcvc.parameters(): para.requires_grad = True optimizer_all_1e_5 = optim.AdamW(net_dcvc.parameters(), lr=1e-5) for epoch in range(10, 16): train_0_15(net_dcvc, net_cheng, train_dataloader_2, optimizer_all_1e_5, epoch, args) name_remain = "dcvc_lm256ckpt"+str(epoch)+"_e5_fromowner.pth.tar" save_checkpoint( { "epoch": epoch, "state_dict": net_dcvc.state_dict(), }, name_remain, )
Thanks for your sharing. I'd like to have two points to say. 1. Have you modified model_dcvc()? The input is d and epoch? d is the reference frame and input frame. I did not remember the epoch is the input too. Moreover, how do you increase the frames number later, for example, 3 and 5 frames as the paper indicates? 2. There are several stages, in the first stage, the motion compensation frame is used as a loss function rather than output_p(is it the reconstructed frame?). They are my personal opinions which may not be correct.
Sure, we can have more discussions as I wished. I noticed the figure that you showed above and I thought the training iterations may not be enough as there are only about 5k iterations in your one training stage. And I think the cascaded loss in https://arxiv.org/abs/2111.13850 to alleviates error propogation can be neglected first which is come from "Lu, G., Cai, C., Zhang, X., Chen, L., Ouyang, W., Xu, D., & Gao, Z. (2020, August). Content adaptive and error propagation aware deep video compression. In European Conference on Computer Vision (pp. 456-472). Springer, Cham". I also need some time to check my problem. If you don't mind, we could have a further discussion on zoom in the future.
Oh yes. I have set the batch size to 8. I will change it to 4. Of course, I can use zoom or Tencent meeting or Skype anytime. The problem with my training comes before the cascade training, which is at the "Single" stage. I trained with the rate-distortion function on the quality 0 checkpoints published by the author, and the bpp of mv_y showed a rapid rise after a while during training, whether the learning rate is 1e-4 or 1e-5. The loss is rd_loss = 256 * distortion + distribution_loss. And the distortion is torch.mean((recon_image - input_image).pow(2)). The distribution_loss is bpp = bpp_y + bpp_z + bpp_mv_y + bpp_mv_z. I can give my training code here. It's so short that I don't know where the problem is.
def clip_gradient(optimizer, grad_clip): for group in optimizer.paramgroups: for param in group["params"]: if param.grad is not None: param.grad.data.clamp(-grad_clip, grad_clip)
def train_0_15(model_dcvc, model_cheng, train_dataloader, optimizer, epoch, args): model_cheng.eval() model_dcvc.train() device = next(model_dcvc.parameters()).device for i, batch in enumerate(train_dataloader): d = [frames.to(device) for frames in batch]#2 frames: d[0],d[1] with torch.no_grad(): output_i = model_cheng(d[0]) d[0] = output_i["x_hat"] output_p = model_dcvc(d, epoch) distribution_loss = output_p["bpp_loss"][0] distortion = output_p["mse_loss"][0] distribution_loss_y = output_p["bpp_y"][0] distribution_loss_z = output_p["bpp_z"][0] distribution_loss_mv_y = output_p["bpp_mv_y"][0] distribution_loss_mv_z = output_p["bpp_mv_z"][0] rd_loss = args.lmbda distortion + distribution_loss optimizer.zero_grad() rd_loss.backward() clip_gradient(optimizer, 5) optimizer.step() writer.add_scalar('distribution_loss',distribution_loss,(epoch-10)16140+i) writer.add_scalar('distribution_loss_y',distribution_loss_y,(epoch-10)16140+i) writer.add_scalar('distribution_loss_z',distribution_loss_z,(epoch-10)16140+i) writer.add_scalar('distribution_loss_mv_y',distribution_loss_mv_y,(epoch-10)16140+i) writer.add_scalar('distribution_loss_mv_z',distribution_loss_mv_z,(epoch-10)16140+i) def main(argv): args = parse_args(argv) device = "cuda" if torch.cuda.is_available() else "cpu" print(device) if args.seed is not None: torch.manual_seed(args.seed) random.seed(args.seed) net_dcvc = DCVC_net() load_checkpoint = torch.load("model_dcvc_quality_0_psnr.pth") net_dcvc.load_dict(load_checkpoint) net_dcvc = net_dcvc.to(device) net_dcvc.train() net_cheng_checkpoint = torch.load(args.i_frame_model_path) net_cheng = architectures[args.i_frame_model_name].from_state_dict(net_cheng_checkpoint) for para in net_cheng.parameters(): para.requires_grad = False net_cheng = net_cheng.to(device) net_cheng.eval() train_transforms = transforms.Compose( [transforms.ToTensor(), transforms.RandomCrop(args.patch_size)] ) test_transforms = transforms.Compose( [transforms.ToTensor(), transforms.CenterCrop(args.patch_size)] ) train_dataset_2 = VideoFolder( max_frames=2, root = args.dataset, rnd_interval=True, rnd_temp_order=True, split="train", transform=train_transforms, ) train_dataloader_2 = DataLoader( train_dataset_2, batch_size=args.batch_size, num_workers=args.num_workers, shuffle=True, pin_memory=(device == "cuda"), ) for para in net_dcvc.parameters(): para.requires_grad = True optimizer_all_1e_5 = optim.AdamW(net_dcvc.parameters(), lr=1e-5) for epoch in range(10, 16): train_0_15(net_dcvc, net_cheng, train_dataloader_2, optimizer_all_1e_5, epoch, args) name_remain = "dcvc_lm256ckpt"+str(epoch)+"_e5_fromowner.pth.tar" save_checkpoint( { "epoch": epoch, "state_dict": net_dcvc.state_dict(), }, name_remain, )
I thought global step (interations) should be used to control the different loss functions and frames used in different stages.
Thanks for your sharing. I'd like to have two points to say. 1. Have you modified model_dcvc()? The input is d and epoch? d is the reference frame and input frame. I did not remember the epoch is the input too. Moreover, how do you increase the frames number later, for example, 3 and 5 frames as the paper indicates? 2. There are several stages, in the first stage, the motion compensation frame is used as a loss function rather than output_p(is it the reconstructed frame?). They are my personal opinions which may not be correct.
I thought global step (interations) should be used to control the different loss functions and frames used in different stages.
Can you add my skype account using this link if you don't mind? This makes it easy to start a video conference.
Thanks for your sharing. I'd like to have two points to say. 1. Have you modified model_dcvc()? The input is d and epoch? d is the reference frame and input frame. I did not remember the epoch is the input too. Moreover, how do you increase the frames number later, for example, 3 and 5 frames as the paper indicates? 2. There are several stages, in the first stage, the motion compensation frame is used as a loss function rather than output_p(is it the reconstructed frame?). They are my personal opinions which may not be correct. I thought global step (interations) should be used to control the different loss functions and frames used in different stages.
- Yes I changed the dcvc_net used for training. By passing in multiple frames in a list format. But in the inter-frame encoding stage, I strictly abide by the author's forward function and also use the author's devc_net.py in the test。 In the "single" stage, the first frame (d[0]) is encoded and decoded by model_cheng (output_i = model_cheng(d[0])), and then assigned to d[0] (d[0] = output_i["x_hat"] ). Then the d list (the reconstructed I frame: d[0] and the original p frame: d[1]) passes into model_dcvc to obtain the reconstructed image of the p frame and the corresponding distortion, distribution_loss. When the number of frames increases to 3 or 5, the length of the d list also increases to 3 or 5. Similarly, after the I frame is reconstructed by model_cheng, the d list is sent to model_dcvc, and each p frame is encoded and decoded in turn to obtain the reconstructed image and distortion, distribution_loss corresponding to each p frame.
- Yes you are right. In the first 3 steps, I changed the output_p of the dcvc output. It is x_pixel_warp = flow_warp(referframe, quant_mv_upsample_refine). It's the warped frame in the pixel domain, namely using mv to do warping operation on the referframe.
- Yes you are right.. Global step should be used to control the different loss functions and frames used in different stages. I assign different training functions, loss functions, and optimizers to different stages. And test the checkpoints of each stage to see whether it works well. The current problem is that in the "ALL" part of the "Single" stage, the bpp of mv_y does not converge. (My previous problem in the "Recon" stage was solved. I added unnecessary noise during the quantization process. After removing it, the training in this stage is normal.)
Can you add my skype account using this link if you don't mind? This makes it easy to start a video conference.
I have added your skype. Maybe we can discuss it tomorrow or on the weekend? Thank you.
Anytime is OK. :)
Hi :) Could you please share the training code to us? Thank you.
Hi :) Could you please share the training code to us? Thank you.
I have released my training code for the “Single” stage in my previous reply, you can check it out. I think if you can get the correct results in the “Single” training stage, it wouldn't be difficult to complete the subsequent training.
- Yes you are right.. Global step should be used to control the different loss functions and frames used in different stages. I assign different training functions, loss functions, and optimizers to different stages. And test the checkpoints of each stage to see whether it works well. The current problem is that in the "ALL" part of the "Single" stage, the bpp of mv_y does not converge. (My previous problem in the "Recon" stage was solved. I added unnecessary noise during the quantization process. After removing it, the training in this stage is normal.)
Hi, have you solved the problem? I have similar question. And in my opinion, the noise after quantization is needed cause the round operation is not differentiable or it will be zero grads in the backward propagation.
Hello! Could you please share the whole training code with us? Thank you very much!
I am writing the training code too. Waiting for the reply of the authors too. Thank you.
Hello! Recently, I am also working on the training process of this paper. Do you mind sharing yours as a reference for us? Thank you so much!
I am writing the training code too. Waiting for the reply of the authors too. Thank you.
Hello! Recently, I am also working on the training process of this paper. Do you mind sharing yours as a reference for us? Thank you so much!
Hello, we have a paper in the process right now and have considered sharing code after the paper.
Hi, @z1296 @guohf3
Do you have any update? I'm trying progressive training, but the loss of bpp_y and bpp_mv_y couldn't descent, and the loss of bpp_mv_z and bpp_z keep in zero. Do you have any idea? Thanks.
Train epoch 0: [38000/51313 (74%)] Loss: 0.010 | Loss: 0.101 | loss_y: 0.000 | loss_z: 0.06 | loss_mv_y: 0.00 | loss_mv_z: 0.04 |
Train epoch 0: [38080/51313 (74%)] Loss: 0.004 | Loss: 0.101 | loss_y: 0.000 | loss_z: 0.06 | loss_mv_y: 0.00 | loss_mv_z: 0.04 |
Train epoch 0: [38160/51313 (74%)] Loss: 0.005 | Loss: 0.101 | loss_y: 0.000 | loss_z: 0.06 | loss_mv_y: 0.00 | loss_mv_z: 0.04 |
Train epoch 0: [38240/51313 (75%)] Loss: 0.009 | Loss: 0.101 | loss_y: 0.000 | loss_z: 0.06 | loss_mv_y: 0.00 | loss_mv_z: 0.04 |
Train epoch 0: [38320/51313 (75%)] Loss: 0.007 | Loss: 0.101 | loss_y: 0.000 | loss_z: 0.06 | loss_mv_y: 0.00 | loss_mv_z: 0.04 |
Train epoch 0: [38400/51313 (75%)] Loss: 0.016 | Loss: 0.101 | loss_y: 0.000 | loss_z: 0.06 | loss_mv_y: 0.00 | loss_mv_z: 0.04 |
Train epoch 0: [38480/51313 (75%)] Loss: 0.010 | Loss: 0.101 | loss_y: 0.000 | loss_z: 0.06 | loss_mv_y: 0.00 | loss_mv_z: 0.04 |
Train epoch 0: [38560/51313 (75%)] Loss: 0.009 | Loss: 0.101 | loss_y: 0.000 | loss_z: 0.06 | loss_mv_y: 0.00 | loss_mv_z: 0.04 |
Train epoch 0: [38640/51313 (75%)] Loss: 0.006 | Loss: 0.101 | loss_y: 0.000 | loss_z: 0.06 | loss_mv_y: 0.00 | loss_mv_z: 0.04 |
Train epoch 0: [38720/51313 (75%)] Loss: 0.005 | Loss: 0.101 | loss_y: 0.000 | loss_z: 0.06 | loss_mv_y: 0.00 | loss_mv_z: 0.04 |
Train epoch 0: [38800/51313 (76%)] Loss: 0.007 | Loss: 0.101 | loss_y: 0.000 | loss_z: 0.06 | loss_mv_y: 0.00 | loss_mv_z: 0.04 |
Train epoch 0: [38880/51313 (76%)] Loss: 0.009 | Loss: 0.101 | loss_y: 0.000 | loss_z: 0.06 | loss_mv_y: 0.00 | loss_mv_z: 0.04 |
Train epoch 0: [38960/51313 (76%)] Loss: 0.007 | Loss: 0.101 | loss_y: 0.000 | loss_z: 0.06 | loss_mv_y: 0.00 | loss_mv_z: 0.04 |
BRs, tzayuan
Dear author, first of all, thank you very much for sharing your excellent research. It is very innovative and gets outstanding results. I'm trying to write training code based on your article's description. But I encountered a problem in the third stage of training. When I train the whole framework using Loss contextual_coding with only freezing the MV generation part, the bpp of y continues to rise. Although the bpp of z has a slight decrease (the strange thing is that it reaches 0 quickly), the overall bpp shows an upward trend. I tried to reduce the learning rate to 1e-5, but this phenomenon still exists. I put the test results for each epoch below. Looking forward to your reply. Thank you very much. Fig1: The test result of Step 2. Fig2: The test result of the first epoch in Step 3. Fig3: The test result of the second epoch in Step 3. Fig4: The test result of the third epoch in Step 3.