AceCoooool / DSS-pytorch

:star: PyTorch implement of Deeply Supervised Salient Object Detection with Short Connection
MIT License
173 stars 53 forks source link

RuntimeError: DataLoader worker (pid 16126) exited unexpectedly with exit code 1 #5

Open Kyle0936 opened 6 years ago

Kyle0936 commented 6 years ago

Sorry to bother you but I had another issue when I tried to train with my own data sets: python main.py --mode='train' --train_path='data/images' --label_path='data/ground_truth_mask' --batch_size=8 --visdom=False

It seemed to run correctly as first, but then came the error:

...... The number of parameters: 62238175 /Users/kyle/Documents/MATLAB/DSS-py/solver.py:138: UserWarning: torch.nn.utils.clip_grad_norm is now deprecated in favor of torch.nn.utils.clip_gradnorm. utils.clip_grad_norm(self.net.parameters(), self.config.clip_gradient) /Users/kyle/Documents/MATLAB/DSS-py/solver.py:141: UserWarning: invalid index of a 0-dim tensor. This will be an error in PyTorch 0.5. Use tensor.item() to convert a 0-dim tensor to a Python number loss_epoch += loss.cpu().data[0] /Users/kyle/Documents/MATLAB/DSS-py/solver.py:143: UserWarning: invalid index of a 0-dim tensor. This will be an error in PyTorch 0.5. Use tensor.item() to convert a 0-dim tensor to a Python number epoch, self.config.epoch, i, iter_num, loss.cpu().data[0])) epoch: [0/500], iter: [0/3], loss: [4.8447] /Users/kyle/Documents/MATLAB/DSS-py/solver.py:145: UserWarning: invalid index of a 0-dim tensor. This will be an error in PyTorch 0.5. Use tensor.item() to convert a 0-dim tensor to a Python number error = OrderedDict([('loss:', loss.cpu().data[0])]) epoch: [0/500], iter: [1/3], loss: [4.8428] epoch: [0/500], iter: [2/3], loss: [4.8411] thread_monitor No such process in pthread_detach Traceback (most recent call last): File "main.py", line 86, in main(config) File "main.py", line 25, in main train.train() File "/Users/kyle/Documents/MATLAB/DSS-py/solver.py", line 127, in train for i, data_batch in enumerate(self.train_loader): File "/Users/kyle/anaconda3/lib/python3.6/site-packages/torch/utils/data/dataloader.py", line 280, in next idx, batch = self._get_batch() File "/Users/kyle/anaconda3/lib/python3.6/site-packages/torch/utils/data/dataloader.py", line 259, in _get_batch return self.data_queue.get() File "/Users/kyle/anaconda3/lib/python3.6/multiprocessing/queues.py", line 335, in get res = self._reader.recv_bytes() File "/Users/kyle/anaconda3/lib/python3.6/multiprocessing/connection.py", line 216, in recv_bytes buf = self._recv_bytes(maxlength) File "/Users/kyle/anaconda3/lib/python3.6/multiprocessing/connection.py", line 407, in _recv_bytes buf = self._recv(4) File "/Users/kyle/anaconda3/lib/python3.6/multiprocessing/connection.py", line 379, in _recv chunk = read(handle, remaining) File "/Users/kyle/anaconda3/lib/python3.6/site-packages/torch/utils/data/dataloader.py", line 178, in handler _error_if_any_worker_fails() RuntimeError: DataLoader worker (pid 16126) exited unexpectedly with exit code 1.

I really appreciate your generous help!

AceCoooool commented 6 years ago

please check your own image and label directory:is there are some "not picture" file in this directory (I try the training in my computer, without the error you meet.)

Kyle0936 commented 6 years ago
screen shot 2018-07-25 at 1 41 20 am screen shot 2018-07-25 at 1 41 39 am

Here are my image set and ground truth set. I also have used "ls -a" to check and have removed '.DS_Store'. The only not picture files left are '.' and '..', which clearly should not be deleted. Are there any requirements for the names or forms of images?