alievk / npbg

Neural Point-Based Graphics
MIT License
325 stars 52 forks source link

Scene.program is None #19

Closed chekirou closed 3 years ago

chekirou commented 3 years ago

Hi, I am trying to fit a scene and I have a problem with the dataloader. At the second epoch, even though the dataset is loaded, the scene.program seems to be None. Do you have an idea on where could the problem ?

Here is error message :

EPOCH 1

TRAIN EVAL MODE IN TRAIN model parameters: 1928771 running on datasets [0] proj_matrix was not set total parameters: 76715531 Traceback (most recent call last): File "train.py", line 517, in train_loss = run_train(epoch, pipeline, args, iter_cb) File "train.py", line 253, in run_train return run_epoch(pipeline, 'train', epoch, args, iter_cb=iter_cb) File "train.py", line 228, in run_epoch run_sub(dl, extra_optimizer) File "train.py", line 118, in run_sub for it, data in enumerate(dl): File "C:\Users\user.conda\envs\npbg\lib\site-packages\torch\utils\data\dataloader.py", line 517, in next data = self._next_data() File "C:\Users\user.conda\envs\npbg\lib\site-packages\torch\utils\data\dataloader.py", line 557, in _next_data data = self._dataset_fetcher.fetch(index) # may raise StopIteration File "C:\Users\user.conda\envs\npbg\lib\site-packages\torch\utils\data_utils\fetch.py", line 44, in fetch data = [self.dataset[idx] for idx in possibly_batched_index] File "C:\Users\user.conda\envs\npbg\lib\site-packages\torch\utils\data_utils\fetch.py", line 44, in data = [self.dataset[idx] for idx in possibly_batched_index] File "C:\Users\user.conda\envs\npbg\lib\site-packages\torch\utils\data\dataset.py", line 219, in getitem return self.datasets[dataset_idx][sampleidx] File "C:\Users\user\Documents\npbg\npbg\datasets\dynamic.py", line 246, in getitem input = self.renderer.render(view_matrix=view_matrix, proj_matrix=proj_matrix) File "C:\Users\user\Documents\npbg\npbg\datasets\dynamic.py", line 68, in render self.scene.set_camera_view(view_matrix) File "C:\Users\user\Documents\npbg\npbg\gl\programs.py", line 366, in set_camera_view self.program['m_view'] = inv(m).T

chekirou commented 3 years ago

Solved by deleting the renderer in the unload function.

Shubhendu-Jena commented 2 years ago

Hi.

I'm facing the same issue. Could you please elaborate on what exactly you did to solve the issue? Which unload function are you talking about?

Thanks in advance

Shubhendu-Jena commented 2 years ago

Hi,

@alievk @seva100 do you have any pointers to solve this issue? I deleted the renderer and set it as None in the unload function (line 182 in /npbg/datasets/dynamic.py). While this solves the problem and I am able to train beyond the 1st epoch, the cpu ram usage and the gpu memory usage increases continuously and hence, I'm not able to train beyond 20 epochs at a time.

Thanks in advance

seva100 commented 2 years ago

Hi @Shubhendu-Jena, unfortunately, I have never stumbled upon this issue. I can only suggest saving checkpoints regularly and restarting training from the latest checkpoint after it fails because of memory overflow...