Open SevenLJY opened 1 year ago
Hi! Thanks for pointing this out! I've temporarily fixed this problem by moving dataset assignment to on_X_batch_start
in BaseSystem
. The reason why a large part of data processing happens outside the datamodule is that generating rays on-the-fly is more efficient than generating all rays at once, which will cost a lot of memory and cost time indexing large arrays. I'm also interested if there is a more elegant way doing this instead of manually switching.
Hi Yuanchen,
When I train the model on my own data, I realized that my validation set gets overfitting. So I printed out the split source for each batch, then I found that
self.dataset
didn't switch back totrain_dataloader().dataset
after the first validation process finished. In other words, after the first validation process,self.dataset
remains to beval_dataloader().dataset
, so the model keeps training on my val set which causes the overfitting.I think the reason is that you manually switch the
self.dataset
insideon_train_start()
which is called only once at the very beginning outside of the training loop. So the training loop has no chance to use the training set onceon_val_start()
is called.For now, my quick fix is moving
self.dataset = self.trainer.datamodule.train_dataloader().dataset
toon_train_batch_start()
, but it is not so efficient. Otherwise, I think we need to refactor datamodule somehow (move theself.preprocess_data()
to the dataloader) to avoid a manual switch.