Zhongdao / gcn_clustering

Code for CVPR'19 paper Linkage-based Face Clustering via GCN
MIT License
360 stars 86 forks source link

about training error. #9

Open xxx2974 opened 5 years ago

xxx2974 commented 5 years ago

Hello, when I try to train the program,it got some errors: How can I slove it?

The details: Current lr 0.01 Traceback (most recent call last): File "train.py", line 165, in main(args) File "train.py", line 64, in main train(trainloader, net, criterion, opt, epoch) File "train.py", line 81, in train for i, ((feat, adj, cid, h1id), gtmat) in enumerate(loader): File "/home/xiaozhenzhen/anaconda2/envs/gcn_pytorch/lib/python2.7/site-packages/torch/utils/data/dataloader.py", line 280, in next idx, batch = self._get_batch() File "/home/xiaozhenzhen/anaconda2/envs/gcn_pytorch/lib/python2.7/site-packages/torch/utils/data/dataloader.py", line 259, in _get_batch return self.data_queue.get() File "/home/xiaozhenzhen/anaconda2/envs/gcn_pytorch/lib/python2.7/Queue.py", line 168, in get self.not_empty.wait() File "/home/xiaozhenzhen/anaconda2/envs/gcn_pytorch/lib/python2.7/threading.py", line 340, in wait waiter.acquire() File "/home/xiaozhenzhen/anaconda2/envs/gcn_pytorch/lib/python2.7/site-packages/torch/utils/data/dataloader.py", line 178, in handler _error_if_any_worker_fails() RuntimeError: DataLoader worker (pid 31687) is killed by signal: Killed. Exception in thread Thread-1: Traceback (most recent call last): File "/home/xiaozhenzhen/anaconda2/envs/gcn_pytorch/lib/python2.7/threading.py", line 801, in bootstrap_inner self.run() File "/home/xiaozhenzhen/anaconda2/envs/gcn_pytorch/lib/python2.7/threading.py", line 754, in run self.target(*self.args, **self.kwargs) File "/home/xiaozhenzhen/anaconda2/envs/gcn_pytorch/lib/python2.7/site-packages/torch/utils/data/dataloader.py", line 71, in _worker_manager_loop r = in_queue.get() File "/home/xiaozhenzhen/anaconda2/envs/gcn_pytorch/lib/python2.7/multiprocessing/queues.py", line 378, in get return recv() File "/home/xiaozhenzhen/anaconda2/envs/gcn_pytorch/lib/python2.7/site-packages/torch/multiprocessing/queue.py", line 22, in recv return pickle.loads(buf) File "/home/xiaozhenzhen/anaconda2/envs/gcn_pytorch/lib/python2.7/pickle.py", line 1388, in loads return Unpickler(file).load() File "/home/xiaozhenzhen/anaconda2/envs/gcn_pytorch/lib/python2.7/pickle.py", line 864, in load dispatchkey File "/home/xiaozhenzhen/anaconda2/envs/gcn_pytorch/lib/python2.7/pickle.py", line 1139, in load_reduce value = func(*args) File "/home/xiaozhenzhen/anaconda2/envs/gcn_pytorch/lib/python2.7/site-packages/torch/multiprocessing/reductions.py", line 68, in rebuild_storage_fd fd = multiprocessing.reduction.rebuild_handle(df) File "/home/xiaozhenzhen/anaconda2/envs/gcn_pytorch/lib/python2.7/multiprocessing/reduction.py", line 155, in rebuild_handle conn = Client(address, authkey=current_process().authkey) File "/home/xiaozhenzhen/anaconda2/envs/gcn_pytorch/lib/python2.7/multiprocessing/connection.py", line 175, in Client answer_challenge(c, authkey) File "/home/xiaozhenzhen/anaconda2/envs/gcn_pytorch/lib/python2.7/multiprocessing/connection.py", line 432, in answer_challenge message = connection.recv_bytes(256) # reject large message IOError: [Errno 104] Connection reset by peer

Exception NameError: "global name 'FileNotFoundError' is not defined" in <bound method _DataLoaderIter.del of <torch.utils.data.dataloader._DataLoaderIter object at 0x7f77d1c11650>> ignored