[Question] Resume training from a checkpoint

facebookresearch / localrf

An algorithm for reconstructing the radiance field of a large-scale scene from a single casually captured video.

MIT License

956 stars 62 forks source link

[Question] Resume training from a checkpoint #34

Closed sevashasla closed 10 months ago

sevashasla commented 10 months ago

Hello! Thank you for the great work. I had been training a model for approximately 20 hours when an error occurred with my computer, causing the training to stop. Is there a way to resume training from the checkpoint? I saw the line TODO: Add midpoint loading and the commented code after it. I could try to implement it by myself, and could you please share the potential problems?

ameuleman commented 10 months ago

Hi, I will not have time to look into it this week. In addition to loading checkpoints (see here), we need to handle the training dataloader train_dataset properly so that it provides the currently training images. I would use local_tensorfs.blending_weights[:, -1] > 0 to determine which frames should be activated / deactivated in the training dataset.

sevashasla commented 10 months ago

Thank you for your fast answer!