bennyguo / instant-nsr-pl

Neural Surface reconstruction based on Instant-NGP. Efficient and customizable boilerplate for your research projects. Train NeuS in 10min!
MIT License
856 stars 84 forks source link

Val set overfitting (due to dataset switch incorrectly) #31

Open SevenLJY opened 1 year ago

SevenLJY commented 1 year ago

Hi Yuanchen,

When I train the model on my own data, I realized that my validation set gets overfitting. So I printed out the split source for each batch, then I found that self.dataset didn't switch back to train_dataloader().dataset after the first validation process finished. In other words, after the first validation process, self.dataset remains to be val_dataloader().dataset, so the model keeps training on my val set which causes the overfitting.

I think the reason is that you manually switch the self.dataset inside on_train_start() which is called only once at the very beginning outside of the training loop. So the training loop has no chance to use the training set once on_val_start() is called.

For now, my quick fix is moving self.dataset = self.trainer.datamodule.train_dataloader().dataset to on_train_batch_start(), but it is not so efficient. Otherwise, I think we need to refactor datamodule somehow (move the self.preprocess_data() to the dataloader) to avoid a manual switch.

bennyguo commented 1 year ago

Hi! Thanks for pointing this out! I've temporarily fixed this problem by moving dataset assignment to on_X_batch_start in BaseSystem. The reason why a large part of data processing happens outside the datamodule is that generating rays on-the-fly is more efficient than generating all rays at once, which will cost a lot of memory and cost time indexing large arrays. I'm also interested if there is a more elegant way doing this instead of manually switching.