pin error - Githubissues

alex-111-gh commented 3 years ago

Any idea what this problem might be? @thomasneff Using the code out of the box with pavillon dataset for training.

WARNING! - import of cuda kernels for 'disc_depth_multiclass' failed - falling back to PyTorch
Training config: lo_l1.0_0.0_SpPoDir[128]-relu0(256x8)-CD-128-5-5_l1.0_10.0_RayMarchFromPoses_nSD[2_LSfCD_128_0.0](nerf(10-4))-relu1(256x80..63-7.63.)-RGBARayMarch (../configs/DONeRF_2_samples.ini)
no Checkpoints found
no Checkpoints found
epoch=21         loss=0.13672528:   0%|                                                                                    | 20/300001 [00:03<7:07:22, 11.70it/s]
Exception in thread Thread-4:
Traceback (most recent call last):
  File "/opt/conda/lib/python3.8/threading.py", line 932, in _bootstrap_inner
    self.run()
  File "/opt/conda/lib/python3.8/threading.py", line 870, in run
    self._target(*self._args, **self._kwargs)
  File "/opt/conda/lib/python3.8/site-packages/torch/utils/data/_utils/pin_memory.py", line 28, in _pin_memory_loop
    idx, data = r
ValueError: not enough values to unpack (expected 2, got 0)
epoch=21         loss=0.13672528:   0%|                                                                                   | 21/300001 [00:08<34:03:39,  2.45it/s]
Traceback (most recent call last):
  File "train.py", line 344, in <module>
    main()
  File "train.py", line 329, in main
    pre_train(train_config)
  File "train.py", line 136, in pre_train
    batch_iterator = iter(train_config.train_data_loader)
  File "/opt/conda/lib/python3.8/site-packages/torch/utils/data/dataloader.py", line 349, in __iter__
    self._iterator._reset(self)
  File "/opt/conda/lib/python3.8/site-packages/torch/utils/data/dataloader.py", line 852, in _reset
    data = self._get_data()
  File "/opt/conda/lib/python3.8/site-packages/torch/utils/data/dataloader.py", line 1029, in _get_data
    raise RuntimeError('Pin memory thread exited unexpectedly')
RuntimeError: Pin memory thread exited unexpectedly

thomasneff commented 3 years ago

Hi!

I think this is probably an issue with your pytorch version. If I remember correctly, I had that same issue using pytorch 1.7.1, but updating to 1.8 fixed it. So try updating and see if that helps!

alex-111-gh commented 3 years ago

Thx! Problem solved indeed, no crashes after 20th epoch. Fingers crossed - hopefully it will continue to train.

facebookresearch / DONERF

pin error #7