Open lukasHoel opened 1 year ago
thank you for the interest. I will push a fix soon.
we could just start from second stage training directly. I wonder if it makes a huge difference < sometime, the diffusion pixel nerf doesn't converge well without the PN init but we don't have a conclusive answer
fixed in 6c01dc83ed584a6a86cf8a936903383895b5a595
Thank you very much for the fast help, really appreciate it! May I ask you also how to fix the error mentioned here? I get the same error message and guess problem could be similar?
fixed
Now everything works. I just improved the loading functionality a bit more. The current implementation would always throw away the weights for model.enc.pos_embed
, also if we continue training a checkpoint at the same stage:
I believe, instead we should first try to load everything (e.g. continue training at the same stage) and only if it fails do this fix (e.g., if loading a 64x64 checkpoint into a 128x128 model, this needs to be re-initialized).
try:
# load all parameters (e.g., continue training at the same stage)
model.load_state_dict(data["model"], strict=True)
except:
print("loading with strict=True failed. Assume we continue from a 64x64 checkpoint and skip certain layers.")
# e.g., if loading a 64x64 checkpoint into a 128x128 model, this needs to be re-initialized
data["model"].pop("model.enc.pos_embed")
print(model.load_state_dict(data["model"], strict=False))
Hi,
I tried to follow the instructions and first train a pixel-nerf checkpoint and then finetune. However, there are several issues when loading the state-dict for the second-stage training.
Some sources of error are:
feats_cond=False
in first stage, butfeats_cond=True
in second stage. This results in different values ofconf_feats_dim
here: https://github.com/ayushtewari/DFM/blob/862e9b338b169f59275ee7c07a54c6d95cd52435/PixelNeRF/pixelnerf_model_cond.py#L78n_feats_out=0
in first stage, butn_feats_out=64
in second stage. This results in different shapes of thepixelNeRF mlp lin_out
matrices.I assume there needs to be an updated loading function, could you please check if you provided the correct loading function?
Otherwise, we could just start from second stage training directly. I wonder if it makes a huge difference?