DeepMC-DCVC / DCVC

Apache License 2.0
60 stars 9 forks source link

Step 3 of the training process does not converge. #8

Open z1296 opened 2 years ago

z1296 commented 2 years ago

Dear author, first of all, thank you very much for sharing your excellent research. It is very innovative and gets outstanding results. I'm trying to write training code based on your article's description. But I encountered a problem in the third stage of training. When I train the whole framework using Loss contextual_coding with only freezing the MV generation part, the bpp of y continues to rise. Although the bpp of z has a slight decrease (the strange thing is that it reaches 0 quickly), the overall bpp shows an upward trend. I tried to reduce the learning rate to 1e-5, but this phenomenon still exists. I put the test results for each epoch below. Looking forward to your reply. Thank you very much. Fig1: The test result of Step 2. Fig2: The test result of the first epoch in Step 3. Fig3: The test result of the second epoch in Step 3. Fig4: The test result of the third epoch in Step 3. lll (2)

guohf3 commented 2 years ago

I am writing the training code too. Waiting for the reply of the authors too. Thank you.

z1296 commented 2 years ago

I am writing the training code too. Waiting for the reply of the authors too. Thank you.

Nice to see your comment. Did your training process go well?

guohf3 commented 2 years ago

I am under training. And I still did not check whether it goes well. I can share with you when I have updated news. Thank you.

z1296 commented 2 years ago

I am under training. And I still did not check whether it goes well. I can share with you when I have updated news. Thank you.

Best wishes. Hope you get a good result.

DeepMC-DCVC commented 2 years ago

Hi, Thanks for your message. Do your check whether the whole optical motion estimation, MV encoding and decoding parts are fixed during this step? Our latest work also uses the similar progressive training stragegy, as shown in the appendix at https://arxiv.org/abs/2111.13850

z1296 commented 2 years ago

Hi, Thanks for your message. Do your check whether the whole optical motion estimation, MV encoding and decoding parts are fixed during this step? Our latest work also uses the similar progressive training stragegy, as shown in the appendix at https://arxiv.org/abs/2111.13850

Thank you for your reply. I have already read your latest work, it is extraordinary. And the training process is described in great detail. In fact, I have trained step by step as described in your latest work. Following your advice, I wrote a simple program that checked whether the whole optical motion estimation, MV encoding, and decoding parts were frozen during the 6 epochs "Recon" training stage. The program and the list of frozen parameters are below. code:

    dict_trained_ori3 = torch.load("dcvc_lm256_ckpt_mv_0_3.pth.tar", map_location=torch.device('cpu'))['state_dict']
    dict_trained_ori9 = torch.load("dcvc_lm256_ckpt_remain_4_9.pth.tar",map_location=torch.device('cpu'))['state_dict']
    finename = "DCVC_Opticflow_MV_Para.txt"
    f = open(finename)
    para_name = f.readline()
    while para_name:
        para_name=para_name.replace("\n", "")
        para_name=para_name.replace("\r","")
        for k in dict_trained_ori9:
            if k.startswith(para_name):
                kp9 = dict_trained_ori9[k]
        for k in dict_trained_ori3:
            if k.startswith(para_name):
                kp3 = dict_trained_ori3[k]
        if not kp3.equal(kp9):
            print(para_name)
        para_name = f.readline()
    f.close()
    assert (dict_trained_ori3["auto_regressive_mv.weight"]*dict_trained_ori3["auto_regressive_mv.mask"])\
        .equal(dict_trained_ori9["auto_regressive_mv.weight"]*dict_trained_ori9["auto_regressive_mv.mask"])

DCVC_Opticflow_MV_Para.txt From this procedure I think I can confirm that, the whole optical motion estimation, MV encoding and decoding parts are fixed during this ”Recon“ step, as long as there are no omissions in my list of parameters that need to be frozen. It's strange that the bpp of y doesn't converge. Looking forward to your suggestions and replies. Thanks again.

DeepMC-DCVC commented 2 years ago
  1. Have you tried several times and each training meets this problem?
  2. I think you can also load the weights from my checkpoint. If this problem still happens, Training Step 3 may be needed to further check. Otherwise, Training Step 1 and 2 need to be checked.
z1296 commented 2 years ago
  1. Have you tried several times and each training meets this problem?
  2. I think you can also load the weights from my checkpoint. If this problem still happens, Training Step 3 may be needed to further check. Otherwise, Training Steps 1 and 2 need to be checked.

Thanks a lot for your reply and suggestions.

  1. Yes.
  2. I have followed your suggestion and did the step 3 training process based on the weights of your checkpoint. You are right. The same problem reappeared. The bpp of y continues to rise rapidly as before whether the learning rate is 1e-4 or 1e-5. I have been checking the code of step 3. But no reason has yet been found. The loss is rd_loss = 256 * distortion + distribution_loss. And the distortion is torch.mean((recon_image - input_image).pow(2)). The distribution_loss is bpp_y+bpp_z. The AdamW optimizer is used. There might be a bug somewhere but I overlooked it. tensorboard (2)
guohf3 commented 2 years ago
  1. Have you tried several times and each training meets this problem?
  2. I think you can also load the weights from my checkpoint. If this problem still happens, Training Step 3 may be needed to further check. Otherwise, Training Steps 1 and 2 need to be checked.

Thanks a lot for your reply and suggestions.

  1. Yes.
  2. I have followed your suggestion and did the step 3 training process based on the weights of your checkpoint. You are right. The same problem reappeared. The bpp of y continues to rise rapidly as before whether the learning rate is 1e-4 or 1e-5. I have been checking the code of step 3. But no reason has yet been found. The loss is rd_loss = 256 * distortion + distribution_loss. And the distortion is torch.mean((recon_image - input_image).pow(2)). The distribution_loss is bpp_y+bpp_z. The AdamW optimizer is used. There might be a bug somewhere but I overlooked it. tensorboard (2)

Hello, You close this problem, did you solve your problem? My training process also goes wrong. :(

z1296 commented 2 years ago

Hello, You close this problem, did you solve your problem? My training process also goes wrong. :(

No. :( The author provides an idea of ​​​​checking: train at his public checkpoints. I might have to spend some time experimenting, so I closed the question. Now it looks like there's more discussion to be had, so I reopen it now. I think maybe we could share the test results after each stage of training to see where the problem is.

guohf3 commented 2 years ago

Hello, You close this problem, did you solve your problem? My training process also goes wrong. :(

No. :( The author provides an idea of ​​​​checking: train at his public checkpoints. I might have to spend some time experimenting, so I closed the question. Now it looks like there's more discussion to be had, so I reopen it now. I think maybe we could share the test results after each stage of training to see where the problem is.

Sure, we can have more discussions as I wished. I noticed the figure that you showed above and I thought the training iterations may not be enough as there are only about 5k iterations in your one training stage. And I think the cascaded loss in https://arxiv.org/abs/2111.13850 to alleviates error propogation can be neglected first which is come from "Lu, G., Cai, C., Zhang, X., Chen, L., Ouyang, W., Xu, D., & Gao, Z. (2020, August). Content adaptive and error propagation aware deep video compression. In European Conference on Computer Vision (pp. 456-472). Springer, Cham". I also need some time to check my problem. If you don't mind, we could have a further discussion on zoom in the future.

z1296 commented 2 years ago

Sure, we can have more discussions as I wished. I noticed the figure that you showed above and I thought the training iterations may not be enough as there are only about 5k iterations in your one training stage. And I think the cascaded loss in https://arxiv.org/abs/2111.13850 to alleviates error propogation can be neglected first which is come from "Lu, G., Cai, C., Zhang, X., Chen, L., Ouyang, W., Xu, D., & Gao, Z. (2020, August). Content adaptive and error propagation aware deep video compression. In European Conference on Computer Vision (pp. 456-472). Springer, Cham". I also need some time to check my problem. If you don't mind, we could have a further discussion on zoom in the future.

Oh yes. I have set the batch size to 8. I will change it to 4. Of course, I can use zoom or Tencent meeting or Skype anytime. The problem with my training comes before the cascade training, which is at the "Single" stage. I trained with the rate-distortion function on the quality 0 checkpoints published by the author, and the bpp of mv_y showed a rapid rise after a while during training, whether the learning rate is 1e-4 or 1e-5. The loss is rd_loss = 256 * distortion + distribution_loss. And the distortion is torch.mean((recon_image - input_image).pow(2)). The distribution_loss is bpp = bpp_y + bpp_z + bpp_mv_y + bpp_mv_z. 5_loss I can give my training code here. It's so short that I don't know where the problem is.

def clip_gradient(optimizer, grad_clip):
    for group in optimizer.param_groups:
        for param in group["params"]:
            if param.grad is not None:
                param.grad.data.clamp_(-grad_clip, grad_clip)    
def train_0_15(model_dcvc, model_cheng, train_dataloader, optimizer, epoch, args):
    model_cheng.eval()
    model_dcvc.train()
    device = next(model_dcvc.parameters()).device
    for i, batch in enumerate(train_dataloader):
        d = [frames.to(device) for frames in batch]#2 frames: d[0],d[1]
        with torch.no_grad():
            output_i = model_cheng(d[0])
        d[0] = output_i["x_hat"]
        output_p = model_dcvc(d, epoch)
        distribution_loss = output_p["bpp_loss"][0]
        distortion = output_p["mse_loss"][0]
        distribution_loss_y = output_p["bpp_y"][0]
        distribution_loss_z = output_p["bpp_z"][0]
        distribution_loss_mv_y = output_p["bpp_mv_y"][0]
        distribution_loss_mv_z = output_p["bpp_mv_z"][0]
        rd_loss = args.lmbda * distortion + distribution_loss
        optimizer.zero_grad()
        rd_loss.backward()
        clip_gradient(optimizer, 5)
        optimizer.step()
        writer.add_scalar('distribution_loss',distribution_loss,(epoch-10)*16140+i)
        writer.add_scalar('distribution_loss_y',distribution_loss_y,(epoch-10)*16140+i)
        writer.add_scalar('distribution_loss_z',distribution_loss_z,(epoch-10)*16140+i)
        writer.add_scalar('distribution_loss_mv_y',distribution_loss_mv_y,(epoch-10)*16140+i)
        writer.add_scalar('distribution_loss_mv_z',distribution_loss_mv_z,(epoch-10)*16140+i)
def main(argv):
    args = parse_args(argv)
    device = "cuda" if torch.cuda.is_available() else "cpu"
    print(device)
    if args.seed is not None:
        torch.manual_seed(args.seed)
        random.seed(args.seed)
    net_dcvc = DCVC_net()
    load_checkpoint = torch.load("model_dcvc_quality_0_psnr.pth")
    net_dcvc.load_dict(load_checkpoint)
    net_dcvc = net_dcvc.to(device)
    net_dcvc.train()
    net_cheng_checkpoint = torch.load(args.i_frame_model_path)
    net_cheng = architectures[args.i_frame_model_name].from_state_dict(net_cheng_checkpoint)
    for para in net_cheng.parameters():
        para.requires_grad = False
    net_cheng = net_cheng.to(device)
    net_cheng.eval()
    train_transforms = transforms.Compose(
        [transforms.ToTensor(), transforms.RandomCrop(args.patch_size)]
    )
    test_transforms = transforms.Compose(
        [transforms.ToTensor(), transforms.CenterCrop(args.patch_size)]
    )
    train_dataset_2 = VideoFolder(
        max_frames=2,
        root = args.dataset,
        rnd_interval=True,
        rnd_temp_order=True,
        split="train",
        transform=train_transforms,
    )
    train_dataloader_2 = DataLoader(
        train_dataset_2,
        batch_size=args.batch_size,
        num_workers=args.num_workers,
        shuffle=True,
        pin_memory=(device == "cuda"),
    )
    for para in net_dcvc.parameters():
        para.requires_grad = True
    optimizer_all_1e_5 = optim.AdamW(net_dcvc.parameters(), lr=1e-5)
    for epoch in range(10, 16):
        train_0_15(net_dcvc, net_cheng, train_dataloader_2, optimizer_all_1e_5, epoch, args)
        name_remain = "dcvc_lm256_ckpt_"+str(epoch)+"_e5_fromowner.pth.tar"
        save_checkpoint(
            {
                "epoch": epoch,
                "state_dict": net_dcvc.state_dict(),
            },
            name_remain,
        )
guohf3 commented 2 years ago

Sure, we can have more discussions as I wished. I noticed the figure that you showed above and I thought the training iterations may not be enough as there are only about 5k iterations in your one training stage. And I think the cascaded loss in https://arxiv.org/abs/2111.13850 to alleviates error propogation can be neglected first which is come from "Lu, G., Cai, C., Zhang, X., Chen, L., Ouyang, W., Xu, D., & Gao, Z. (2020, August). Content adaptive and error propagation aware deep video compression. In European Conference on Computer Vision (pp. 456-472). Springer, Cham". I also need some time to check my problem. If you don't mind, we could have a further discussion on zoom in the future.

Oh yes. I have set the batch size to 8. I will change it to 4. Of course, I can use zoom or Tencent meeting or Skype anytime. The problem with my training comes before the cascade training, which is at the "Single" stage. I trained with the rate-distortion function on the quality 0 checkpoints published by the author, and the bpp of mv_y showed a rapid rise after a while during training, whether the learning rate is 1e-4 or 1e-5. The loss is rd_loss = 256 * distortion + distribution_loss. And the distortion is torch.mean((recon_image - input_image).pow(2)). The distribution_loss is bpp = bpp_y + bpp_z + bpp_mv_y + bpp_mv_z. 5_loss I can give my training code here. It's so short that I don't know where the problem is.

def clip_gradient(optimizer, grad_clip): for group in optimizer.paramgroups: for param in group["params"]: if param.grad is not None: param.grad.data.clamp(-grad_clip, grad_clip)
def train_0_15(model_dcvc, model_cheng, train_dataloader, optimizer, epoch, args): model_cheng.eval() model_dcvc.train() device = next(model_dcvc.parameters()).device for i, batch in enumerate(train_dataloader): d = [frames.to(device) for frames in batch]#2 frames: d[0],d[1] with torch.no_grad(): output_i = model_cheng(d[0]) d[0] = output_i["x_hat"] output_p = model_dcvc(d, epoch) distribution_loss = output_p["bpp_loss"][0] distortion = output_p["mse_loss"][0] distribution_loss_y = output_p["bpp_y"][0] distribution_loss_z = output_p["bpp_z"][0] distribution_loss_mv_y = output_p["bpp_mv_y"][0] distribution_loss_mv_z = output_p["bpp_mv_z"][0] rd_loss = args.lmbda distortion + distribution_loss optimizer.zero_grad() rd_loss.backward() clip_gradient(optimizer, 5) optimizer.step() writer.add_scalar('distribution_loss',distribution_loss,(epoch-10)16140+i) writer.add_scalar('distribution_loss_y',distribution_loss_y,(epoch-10)16140+i) writer.add_scalar('distribution_loss_z',distribution_loss_z,(epoch-10)16140+i) writer.add_scalar('distribution_loss_mv_y',distribution_loss_mv_y,(epoch-10)16140+i) writer.add_scalar('distribution_loss_mv_z',distribution_loss_mv_z,(epoch-10)16140+i) def main(argv): args = parse_args(argv) device = "cuda" if torch.cuda.is_available() else "cpu" print(device) if args.seed is not None: torch.manual_seed(args.seed) random.seed(args.seed) net_dcvc = DCVC_net() load_checkpoint = torch.load("model_dcvc_quality_0_psnr.pth") net_dcvc.load_dict(load_checkpoint) net_dcvc = net_dcvc.to(device) net_dcvc.train() net_cheng_checkpoint = torch.load(args.i_frame_model_path) net_cheng = architectures[args.i_frame_model_name].from_state_dict(net_cheng_checkpoint) for para in net_cheng.parameters(): para.requires_grad = False net_cheng = net_cheng.to(device) net_cheng.eval() train_transforms = transforms.Compose( [transforms.ToTensor(), transforms.RandomCrop(args.patch_size)] ) test_transforms = transforms.Compose( [transforms.ToTensor(), transforms.CenterCrop(args.patch_size)] ) train_dataset_2 = VideoFolder( max_frames=2, root = args.dataset, rnd_interval=True, rnd_temp_order=True, split="train", transform=train_transforms, ) train_dataloader_2 = DataLoader( train_dataset_2, batch_size=args.batch_size, num_workers=args.num_workers, shuffle=True, pin_memory=(device == "cuda"), ) for para in net_dcvc.parameters(): para.requires_grad = True optimizer_all_1e_5 = optim.AdamW(net_dcvc.parameters(), lr=1e-5) for epoch in range(10, 16): train_0_15(net_dcvc, net_cheng, train_dataloader_2, optimizer_all_1e_5, epoch, args) name_remain = "dcvc_lm256ckpt"+str(epoch)+"_e5_fromowner.pth.tar" save_checkpoint( { "epoch": epoch, "state_dict": net_dcvc.state_dict(), }, name_remain, )

Thanks for your sharing. I'd like to have two points to say. 1. Have you modified model_dcvc()? The input is d and epoch? d is the reference frame and input frame. I did not remember the epoch is the input too. Moreover, how do you increase the frames number later, for example, 3 and 5 frames as the paper indicates? 2. There are several stages, in the first stage, the motion compensation frame is used as a loss function rather than output_p(is it the reconstructed frame?). They are my personal opinions which may not be correct.

guohf3 commented 2 years ago

Sure, we can have more discussions as I wished. I noticed the figure that you showed above and I thought the training iterations may not be enough as there are only about 5k iterations in your one training stage. And I think the cascaded loss in https://arxiv.org/abs/2111.13850 to alleviates error propogation can be neglected first which is come from "Lu, G., Cai, C., Zhang, X., Chen, L., Ouyang, W., Xu, D., & Gao, Z. (2020, August). Content adaptive and error propagation aware deep video compression. In European Conference on Computer Vision (pp. 456-472). Springer, Cham". I also need some time to check my problem. If you don't mind, we could have a further discussion on zoom in the future.

Oh yes. I have set the batch size to 8. I will change it to 4. Of course, I can use zoom or Tencent meeting or Skype anytime. The problem with my training comes before the cascade training, which is at the "Single" stage. I trained with the rate-distortion function on the quality 0 checkpoints published by the author, and the bpp of mv_y showed a rapid rise after a while during training, whether the learning rate is 1e-4 or 1e-5. The loss is rd_loss = 256 * distortion + distribution_loss. And the distortion is torch.mean((recon_image - input_image).pow(2)). The distribution_loss is bpp = bpp_y + bpp_z + bpp_mv_y + bpp_mv_z. 5_loss I can give my training code here. It's so short that I don't know where the problem is.

def clip_gradient(optimizer, grad_clip): for group in optimizer.paramgroups: for param in group["params"]: if param.grad is not None: param.grad.data.clamp(-grad_clip, grad_clip)
def train_0_15(model_dcvc, model_cheng, train_dataloader, optimizer, epoch, args): model_cheng.eval() model_dcvc.train() device = next(model_dcvc.parameters()).device for i, batch in enumerate(train_dataloader): d = [frames.to(device) for frames in batch]#2 frames: d[0],d[1] with torch.no_grad(): output_i = model_cheng(d[0]) d[0] = output_i["x_hat"] output_p = model_dcvc(d, epoch) distribution_loss = output_p["bpp_loss"][0] distortion = output_p["mse_loss"][0] distribution_loss_y = output_p["bpp_y"][0] distribution_loss_z = output_p["bpp_z"][0] distribution_loss_mv_y = output_p["bpp_mv_y"][0] distribution_loss_mv_z = output_p["bpp_mv_z"][0] rd_loss = args.lmbda distortion + distribution_loss optimizer.zero_grad() rd_loss.backward() clip_gradient(optimizer, 5) optimizer.step() writer.add_scalar('distribution_loss',distribution_loss,(epoch-10)16140+i) writer.add_scalar('distribution_loss_y',distribution_loss_y,(epoch-10)16140+i) writer.add_scalar('distribution_loss_z',distribution_loss_z,(epoch-10)16140+i) writer.add_scalar('distribution_loss_mv_y',distribution_loss_mv_y,(epoch-10)16140+i) writer.add_scalar('distribution_loss_mv_z',distribution_loss_mv_z,(epoch-10)16140+i) def main(argv): args = parse_args(argv) device = "cuda" if torch.cuda.is_available() else "cpu" print(device) if args.seed is not None: torch.manual_seed(args.seed) random.seed(args.seed) net_dcvc = DCVC_net() load_checkpoint = torch.load("model_dcvc_quality_0_psnr.pth") net_dcvc.load_dict(load_checkpoint) net_dcvc = net_dcvc.to(device) net_dcvc.train() net_cheng_checkpoint = torch.load(args.i_frame_model_path) net_cheng = architectures[args.i_frame_model_name].from_state_dict(net_cheng_checkpoint) for para in net_cheng.parameters(): para.requires_grad = False net_cheng = net_cheng.to(device) net_cheng.eval() train_transforms = transforms.Compose( [transforms.ToTensor(), transforms.RandomCrop(args.patch_size)] ) test_transforms = transforms.Compose( [transforms.ToTensor(), transforms.CenterCrop(args.patch_size)] ) train_dataset_2 = VideoFolder( max_frames=2, root = args.dataset, rnd_interval=True, rnd_temp_order=True, split="train", transform=train_transforms, ) train_dataloader_2 = DataLoader( train_dataset_2, batch_size=args.batch_size, num_workers=args.num_workers, shuffle=True, pin_memory=(device == "cuda"), ) for para in net_dcvc.parameters(): para.requires_grad = True optimizer_all_1e_5 = optim.AdamW(net_dcvc.parameters(), lr=1e-5) for epoch in range(10, 16): train_0_15(net_dcvc, net_cheng, train_dataloader_2, optimizer_all_1e_5, epoch, args) name_remain = "dcvc_lm256ckpt"+str(epoch)+"_e5_fromowner.pth.tar" save_checkpoint( { "epoch": epoch, "state_dict": net_dcvc.state_dict(), }, name_remain, )

I thought global step (interations) should be used to control the different loss functions and frames used in different stages.

z1296 commented 2 years ago

Thanks for your sharing. I'd like to have two points to say. 1. Have you modified model_dcvc()? The input is d and epoch? d is the reference frame and input frame. I did not remember the epoch is the input too. Moreover, how do you increase the frames number later, for example, 3 and 5 frames as the paper indicates? 2. There are several stages, in the first stage, the motion compensation frame is used as a loss function rather than output_p(is it the reconstructed frame?). They are my personal opinions which may not be correct.

I thought global step (interations) should be used to control the different loss functions and frames used in different stages.

  1. Yes I changed the dcvc_net used for training. By passing in multiple frames in a list format. But in the inter-frame encoding stage, I strictly abide by the author's forward function and also use the author's devc_net.py in the test。 In the "single" stage, the first frame (d[0]) is encoded and decoded by model_cheng (output_i = model_cheng(d[0])), and then assigned to d[0] (d[0] = output_i["x_hat"] ). Then the d list (the reconstructed I frame: d[0] and the original p frame: d[1]) passes into model_dcvc to obtain the reconstructed image of the p frame and the corresponding distortion, distribution_loss. When the number of frames increases to 3 or 5, the length of the d list also increases to 3 or 5. Similarly, after the I frame is reconstructed by model_cheng, the d list is sent to model_dcvc, and each p frame is encoded and decoded in turn to obtain the reconstructed image and distortion, distribution_loss corresponding to each p frame.
  2. Yes you are right. In the first 3 steps, I changed the output_p of the dcvc output. It is x_pixel_warp = flow_warp(referframe, quant_mv_upsample_refine). It's the warped frame in the pixel domain, namely using mv to do warping operation on the referframe.
  3. Yes you are right.. Global step should be used to control the different loss functions and frames used in different stages. I assign different training functions, loss functions, and optimizers to different stages. And test the checkpoints of each stage to see whether it works well. The current problem is that in the "ALL" part of the "Single" stage, the bpp of mv_y does not converge. (My previous problem in the "Recon" stage was solved. I added unnecessary noise during the quantization process. After removing it, the training in this stage is normal.)

Can you add my skype account using this link if you don't mind? This makes it easy to start a video conference.

guohf3 commented 2 years ago

Thanks for your sharing. I'd like to have two points to say. 1. Have you modified model_dcvc()? The input is d and epoch? d is the reference frame and input frame. I did not remember the epoch is the input too. Moreover, how do you increase the frames number later, for example, 3 and 5 frames as the paper indicates? 2. There are several stages, in the first stage, the motion compensation frame is used as a loss function rather than output_p(is it the reconstructed frame?). They are my personal opinions which may not be correct. I thought global step (interations) should be used to control the different loss functions and frames used in different stages.

  1. Yes I changed the dcvc_net used for training. By passing in multiple frames in a list format. But in the inter-frame encoding stage, I strictly abide by the author's forward function and also use the author's devc_net.py in the test。 In the "single" stage, the first frame (d[0]) is encoded and decoded by model_cheng (output_i = model_cheng(d[0])), and then assigned to d[0] (d[0] = output_i["x_hat"] ). Then the d list (the reconstructed I frame: d[0] and the original p frame: d[1]) passes into model_dcvc to obtain the reconstructed image of the p frame and the corresponding distortion, distribution_loss. When the number of frames increases to 3 or 5, the length of the d list also increases to 3 or 5. Similarly, after the I frame is reconstructed by model_cheng, the d list is sent to model_dcvc, and each p frame is encoded and decoded in turn to obtain the reconstructed image and distortion, distribution_loss corresponding to each p frame.
  2. Yes you are right. In the first 3 steps, I changed the output_p of the dcvc output. It is x_pixel_warp = flow_warp(referframe, quant_mv_upsample_refine). It's the warped frame in the pixel domain, namely using mv to do warping operation on the referframe.
  3. Yes you are right.. Global step should be used to control the different loss functions and frames used in different stages. I assign different training functions, loss functions, and optimizers to different stages. And test the checkpoints of each stage to see whether it works well. The current problem is that in the "ALL" part of the "Single" stage, the bpp of mv_y does not converge. (My previous problem in the "Recon" stage was solved. I added unnecessary noise during the quantization process. After removing it, the training in this stage is normal.)

Can you add my skype account using this link if you don't mind? This makes it easy to start a video conference.

I have added your skype. Maybe we can discuss it tomorrow or on the weekend? Thank you.

z1296 commented 2 years ago

Anytime is OK. :)

jungwoocode commented 2 years ago

Hi :) Could you please share the training code to us? Thank you.

z1296 commented 2 years ago

Hi :) Could you please share the training code to us? Thank you.

I have released my training code for the “Single” stage in my previous reply, you can check it out. I think if you can get the correct results in the “Single” training stage, it wouldn't be difficult to complete the subsequent training.

HawkSun562 commented 2 years ago
  1. Yes you are right.. Global step should be used to control the different loss functions and frames used in different stages. I assign different training functions, loss functions, and optimizers to different stages. And test the checkpoints of each stage to see whether it works well. The current problem is that in the "ALL" part of the "Single" stage, the bpp of mv_y does not converge. (My previous problem in the "Recon" stage was solved. I added unnecessary noise during the quantization process. After removing it, the training in this stage is normal.)

Hi, have you solved the problem? I have similar question. And in my opinion, the noise after quantization is needed cause the round operation is not differentiable or it will be zero grads in the backward propagation.

fanqiNO1 commented 2 years ago

Hello! Could you please share the whole training code with us? Thank you very much!

Kiteretsu77 commented 1 year ago

I am writing the training code too. Waiting for the reply of the authors too. Thank you.

Hello! Recently, I am also working on the training process of this paper. Do you mind sharing yours as a reference for us? Thank you so much!

guohf3 commented 1 year ago

I am writing the training code too. Waiting for the reply of the authors too. Thank you.

Hello! Recently, I am also working on the training process of this paper. Do you mind sharing yours as a reference for us? Thank you so much!

Hello, we have a paper in the process right now and have considered sharing code after the paper.

tzayuan commented 1 year ago

Hi, @z1296 @guohf3

Do you have any update? I'm trying progressive training, but the loss of bpp_y and bpp_mv_y couldn't descent, and the loss of bpp_mv_z and bpp_z keep in zero. Do you have any idea? Thanks.

Train epoch 0: [38000/51313 (74%)]  Loss: 0.010 |   Loss: 0.101 |   loss_y: 0.000 | loss_z: 0.06 |  loss_mv_y: 0.00 |   loss_mv_z: 0.04 |
Train epoch 0: [38080/51313 (74%)]  Loss: 0.004 |   Loss: 0.101 |   loss_y: 0.000 | loss_z: 0.06 |  loss_mv_y: 0.00 |   loss_mv_z: 0.04 |
Train epoch 0: [38160/51313 (74%)]  Loss: 0.005 |   Loss: 0.101 |   loss_y: 0.000 | loss_z: 0.06 |  loss_mv_y: 0.00 |   loss_mv_z: 0.04 |
Train epoch 0: [38240/51313 (75%)]  Loss: 0.009 |   Loss: 0.101 |   loss_y: 0.000 | loss_z: 0.06 |  loss_mv_y: 0.00 |   loss_mv_z: 0.04 |
Train epoch 0: [38320/51313 (75%)]  Loss: 0.007 |   Loss: 0.101 |   loss_y: 0.000 | loss_z: 0.06 |  loss_mv_y: 0.00 |   loss_mv_z: 0.04 |
Train epoch 0: [38400/51313 (75%)]  Loss: 0.016 |   Loss: 0.101 |   loss_y: 0.000 | loss_z: 0.06 |  loss_mv_y: 0.00 |   loss_mv_z: 0.04 |
Train epoch 0: [38480/51313 (75%)]  Loss: 0.010 |   Loss: 0.101 |   loss_y: 0.000 | loss_z: 0.06 |  loss_mv_y: 0.00 |   loss_mv_z: 0.04 |
Train epoch 0: [38560/51313 (75%)]  Loss: 0.009 |   Loss: 0.101 |   loss_y: 0.000 | loss_z: 0.06 |  loss_mv_y: 0.00 |   loss_mv_z: 0.04 |
Train epoch 0: [38640/51313 (75%)]  Loss: 0.006 |   Loss: 0.101 |   loss_y: 0.000 | loss_z: 0.06 |  loss_mv_y: 0.00 |   loss_mv_z: 0.04 |
Train epoch 0: [38720/51313 (75%)]  Loss: 0.005 |   Loss: 0.101 |   loss_y: 0.000 | loss_z: 0.06 |  loss_mv_y: 0.00 |   loss_mv_z: 0.04 |
Train epoch 0: [38800/51313 (76%)]  Loss: 0.007 |   Loss: 0.101 |   loss_y: 0.000 | loss_z: 0.06 |  loss_mv_y: 0.00 |   loss_mv_z: 0.04 |
Train epoch 0: [38880/51313 (76%)]  Loss: 0.009 |   Loss: 0.101 |   loss_y: 0.000 | loss_z: 0.06 |  loss_mv_y: 0.00 |   loss_mv_z: 0.04 |
Train epoch 0: [38960/51313 (76%)]  Loss: 0.007 |   Loss: 0.101 |   loss_y: 0.000 | loss_z: 0.06 |  loss_mv_y: 0.00 |   loss_mv_z: 0.04 |

BRs, tzayuan