cvlab-kaist / GaussianTalker

Official implementation of “GaussianTalker: Real-Time High-Fidelity Talking Head Synthesis with Audio-Driven 3D Gaussian Splatting” by Kyusun Cho, Joungbin Lee, Heeji Yoon, Yeobin Hong, Jaehoon Ko, Sangjun Ahn and Seungryong Kim
Other
300 stars 36 forks source link

RuntimeError: DataLoader worker exited unexpectedly #42

Open pegahs1993 opened 3 months ago

pegahs1993 commented 3 months ago

I am encountering an issue while training my model using PyTorch's DataLoader. The training process abruptly terminates with the following error:

RuntimeError: DataLoader worker (pid(s) xxxx) exited unexpectedly

Error Traceback:

Training progress:   0% 0/10000 [00:00<?, ?it/s]data loading done [20/08 20:48:14]
Traceback (most recent call last):
  File "/usr/local/lib/python3.10/dist-packages/torch/utils/data/dataloader.py", line 1133, in _try_get_data
    data = self._data_queue.get(timeout=timeout)
  File "/usr/lib/python3.10/multiprocessing/queues.py", line 113, in get
    if not self._poll(timeout):
  File "/usr/lib/python3.10/multiprocessing/connection.py", line 257, in poll
    return self._poll(timeout)
  File "/usr/lib/python3.10/multiprocessing/connection.py", line 424, in _poll
    r = wait([self], timeout)
  File "/usr/lib/python3.10/multiprocessing/connection.py", line 931, in wait
    ready = selector.select(timeout)
  File "/usr/lib/python3.10/selectors.py", line 416, in select
    fd_event_list = self._selector.poll(timeout)
  File "/usr/local/lib/python3.10/dist-packages/torch/utils/data/_utils/signal_handling.py", line 66, in handler
    _error_if_any_worker_fails()
RuntimeError: DataLoader worker (pid 12851) is killed by signal: Killed. 

The above exception was the direct cause of the following exception:

Traceback (most recent call last):
  File "/content/GaussianTalker/train.py", line 408, in <module>
    training(lp.extract(args), hp.extract(args), op.extract(args), pp.extract(args), args.test_iterations, args.save_iterations, args.checkpoint_iterations, args.start_checkpoint, args.debug_from, args.expname, args.use_wandb)
  File "/content/GaussianTalker/train.py", line 282, in training
    scene_reconstruction(dataset, opt, hyper, pipe, testing_iterations, saving_iterations,
  File "/content/GaussianTalker/train.py", line 126, in scene_reconstruction
    viewpoint_cams = next(loader)
  File "/usr/local/lib/python3.10/dist-packages/torch/utils/data/dataloader.py", line 631, in __next__
    data = self._next_data()
  File "/usr/local/lib/python3.10/dist-packages/torch/utils/data/dataloader.py", line 1329, in _next_data
    idx, data = self._get_data()
  File "/usr/local/lib/python3.10/dist-packages/torch/utils/data/dataloader.py", line 1295, in _get_data
    success, data = self._try_get_data()
  File "/usr/local/lib/python3.10/dist-packages/torch/utils/data/dataloader.py", line 1146, in _try_get_data
    raise RuntimeError(f'DataLoader worker (pid(s) {pids_str}) exited unexpectedly') from e
RuntimeError: DataLoader worker (pid(s) 12851) exited unexpectedly
Training progress:   0% 0/10000 [00:41<?, ?it/s]
jianglingling007 commented 3 months ago

You can adjust this num_work, the minimum value can be 0

pegahs1993 commented 3 months ago

Thanks! @jianglingling007