Closed zichen34 closed 1 year ago
Hi @Raspberrycai1, thank you for your interest in our work! and apologies for the delay in our response.
The released checkpoints were trained from a private codebase, which I cleaned for public release. During the cleaning, I renamed a layer for better readability but this led to a naming mismatch with the checkpoints. I have fixed this and can confirm that it works from my end. Please let me know in case you run into further trouble!
Thanks
Yes, it works. Thanks a lot for your effort.
Dear authors. Thank you for your great work first.
I would like to resume the training based on your provided pre-trained models. So I run the following command:
python train.py --config configs/gnt_llff.txt --ckpt_path=./trex_model_300000.pth --train_scenes trex --eval_scenes trex --expname resume_trex --chunk_size 500 --N_samples 20
But there will be a RuntimeError:
The size of tensor a (64) must match the size of tensor b (4) at non-singleton dimension 1
when executing this line: https://github.com/VITA-Group/GNT/blob/33a99a9cfb110c6d5de124684f4aa6ab930ea4ae/train.py#L144The following is the Trackback:
By checking the pre-trained optimizer, I found the
['optimizer']['state']
of some layers have a mismatched shape with their corresponding layers. For example, the optimizer state belonging to the 15th layer is a tensor with the shape oftorch.Size([64, 64])
, but the 15th layer of the GNT model has a dimension oftorch.Size([8, 4])
.And further, I realized the problem is that the layer sequences in the pre-trained
optimizer
and initializedGNT
model are different. For this problem, I am not sure whether this issue may apply.I don't know what caused this disorder. Hope you can help me.