ayushtewari / DFM

Implementation of "Diffusion with Forward Models: Solving Stochastic Inverse Problems Without Direct Supervision"
https://diffusion-with-forward-models.github.io/
152 stars 18 forks source link

An error in fine-tuning the model for resolution 128 when loading the parameters of resolution 64. #7

Closed thucz closed 11 months ago

thucz commented 11 months ago

Hi! I encountered an error:

Traceback (most recent call last):
  File "/group/30042/ozhengchen/pano_aigc/DFM/experiment_scripts/train_3D_diffusion.py", line 86, in train
    trainer = Trainer(
  File "/group/30042/ozhengchen/pano_aigc/DFM/denoising_diffusion_pytorch/denoising_diffusion_pytorch.py", line 1028, in __init__
    self.load(checkpoint_path)
  File "/group/30042/ozhengchen/pano_aigc/DFM/denoising_diffusion_pytorch/denoising_diffusion_pytorch.py", line 1106, in load
    model.load_state_dict(data["model"], strict=True)
  File "/group/30042/ozhengchen/ft_local/anaconda3/envs/dfm/lib/python3.9/site-packages/torch/nn/modules/module.py", line 2041, in load_state_dict
    raise RuntimeError('Error(s) in loading state_dict for {}:\n\t{}'.format(RuntimeError: Error(s) in loading state_dict for GaussianDiffusion:
        size mismatch for model.enc.pos_embed: copying a param with shape torch.Size([1, 256, 1152]) from checkpoint, the shape in current model is torch.Size([1, 1024, 1152]).

Do you have any idea about how to fix it?

1ssb commented 11 months ago

Oh I faced this issue as well, you have to set the image size to 128, 128 and the trainer parameters to 128 as well.

On Tue, 24 Oct, 2023, 8:47 pm thucz, @.***> wrote:

Hi! I encountered an error:

Traceback (most recent call last): File "/group/30042/ozhengchen/pano_aigc/DFM/experiment_scripts/train_3D_diffusion.py", line 86, in train trainer = Trainer( File "/group/30042/ozhengchen/pano_aigc/DFM/denoising_diffusion_pytorch/denoising_diffusion_pytorch.py", line 1028, in init self.load(checkpoint_path) File "/group/30042/ozhengchen/pano_aigc/DFM/denoising_diffusion_pytorch/denoising_diffusion_pytorch.py", line 1106, in load model.load_state_dict(data["model"], strict=True) File "/group/30042/ozhengchen/ft_local/anaconda3/envs/dfm/lib/python3.9/site-packages/torch/nn/modules/module.py", line 2041, in load_state_dict raise RuntimeError('Error(s) in loading state_dict for {}:\n\t{}'.format(RuntimeError: Error(s) in loading state_dict for GaussianDiffusion: size mismatch for model.enc.pos_embed: copying a param with shape torch.Size([1, 256, 1152]) from checkpoint, the shape in current model is torch.Size([1, 1024, 1152]).

Do you have any idea about how to fix it?

— Reply to this email directly, view it on GitHub https://github.com/ayushtewari/DFM/issues/7, or unsubscribe https://github.com/notifications/unsubscribe-auth/AJWHFECB5JS2TP4Q7W77UZLYA6FCFAVCNFSM6AAAAAA6NNEDN2VHI2DSMVQWIX3LMV43ASLTON2WKOZRHE2TQOBZGA2DCNY . You are receiving this because you are subscribed to this thread.Message ID: @.***>

thucz commented 11 months ago

Oh I faced this issue as well, you have to set the image size to 128, 128 and the trainer parameters to 128 as well. On Tue, 24 Oct, 2023, 8:47 pm thucz, @.> wrote: Hi! I encountered an error: Traceback (most recent call last): File "/group/30042/ozhengchen/pano_aigc/DFM/experiment_scripts/train_3D_diffusion.py", line 86, in train trainer = Trainer( File "/group/30042/ozhengchen/pano_aigc/DFM/denoising_diffusion_pytorch/denoising_diffusion_pytorch.py", line 1028, in init self.load(checkpoint_path) File "/group/30042/ozhengchen/pano_aigc/DFM/denoising_diffusion_pytorch/denoising_diffusion_pytorch.py", line 1106, in load model.load_state_dict(data["model"], strict=True) File "/group/30042/ozhengchen/ft_local/anaconda3/envs/dfm/lib/python3.9/site-packages/torch/nn/modules/module.py", line 2041, in load_state_dict raise RuntimeError('Error(s) in loading state_dict for {}:\n\t{}'.format(RuntimeError: Error(s) in loading state_dict for GaussianDiffusion: size mismatch for model.enc.pos_embed: copying a param with shape torch.Size([1, 256, 1152]) from checkpoint, the shape in current model is torch.Size([1, 1024, 1152]). Do you have any idea about how to fix it? — Reply to this email directly, view it on GitHub <#7>, or unsubscribe https://github.com/notifications/unsubscribe-auth/AJWHFECB5JS2TP4Q7W77UZLYA6FCFAVCNFSM6AAAAAA6NNEDN2VHI2DSMVQWIX3LMV43ASLTON2WKOZRHE2TQOBZGA2DCNY . You are receiving this because you are subscribed to this thread.Message ID: @.>

Thanks!