fastai / course-v3

The 3rd edition of course.fast.ai
https://course.fast.ai/
Apache License 2.0
4.9k stars 3.55k forks source link

lesson-1 DataLoader worker (pid 8362) is killed by signal: Illegal instruction. #214

Closed HomeLH closed 5 years ago

HomeLH commented 5 years ago

When I try to run lesson-1 jupyter-notebook, I have a problem in runing learn.fit_one_cycle(4) The detailed debug info is as follows:

RuntimeError Traceback (most recent call last)

in ----> 1 learn.fit_one_cycle(4) ~/anaconda3/envs/step/lib/python3.6/site-packages/fastai/train.py in fit_one_cycle(learn, cyc_len, max_lr, moms, div_factor, pct_start, wd, callbacks, **kwargs) 20 callbacks.append(OneCycleScheduler(learn, max_lr, moms=moms, div_factor=div_factor, 21 pct_start=pct_start, **kwargs)) ---> 22 learn.fit(cyc_len, max_lr, wd=wd, callbacks=callbacks) 23 24 def lr_find(learn:Learner, start_lr:Floats=1e-7, end_lr:Floats=10, num_it:int=100, stop_div:bool=True, **kwargs:Any): ~/anaconda3/envs/step/lib/python3.6/site-packages/fastai/basic_train.py in fit(self, epochs, lr, wd, callbacks) 170 callbacks = [cb(self) for cb in self.callback_fns] + listify(callbacks) 171 fit(epochs, self.model, self.loss_func, opt=self.opt, data=self.data, metrics=self.metrics, --> 172 callbacks=self.callbacks+callbacks) 173 174 def create_opt(self, lr:Floats, wd:Floats=0.)->None: ~/anaconda3/envs/step/lib/python3.6/site-packages/fastai/basic_train.py in fit(epochs, model, loss_func, opt, data, callbacks, metrics) 92 except Exception as e: 93 exception = e ---> 94 raise e 95 finally: cb_handler.on_train_end(exception) 96 ~/anaconda3/envs/step/lib/python3.6/site-packages/fastai/basic_train.py in fit(epochs, model, loss_func, opt, data, callbacks, metrics) 80 cb_handler.on_epoch_begin() 81 ---> 82 for xb,yb in progress_bar(data.train_dl, parent=pbar): 83 xb, yb = cb_handler.on_batch_begin(xb, yb) 84 loss = loss_batch(model, xb, yb, loss_func, opt, cb_handler) ~/anaconda3/envs/step/lib/python3.6/site-packages/fastprogress/fastprogress.py in __iter__(self) 63 self.update(0) 64 try: ---> 65 for i,o in enumerate(self._gen): 66 yield o 67 if self.auto_update: self.update(i+1) ~/anaconda3/envs/step/lib/python3.6/site-packages/fastai/basic_data.py in __iter__(self) 68 def __iter__(self): 69 "Process and returns items from `DataLoader`." ---> 70 for b in self.dl: yield self.proc_batch(b) 71 72 @classmethod ~/anaconda3/envs/step/lib/python3.6/site-packages/torch/utils/data/dataloader.py in __next__(self) 629 while True: 630 assert (not self.shutdown and self.batches_outstanding > 0) --> 631 idx, batch = self._get_batch() 632 self.batches_outstanding -= 1 633 if idx != self.rcvd_idx: ~/anaconda3/envs/step/lib/python3.6/site-packages/torch/utils/data/dataloader.py in _get_batch(self) 608 # need to call `.task_done()` because we don't use `.join()`. 609 else: --> 610 return self.data_queue.get() 611 612 def __next__(self): ~/anaconda3/envs/step/lib/python3.6/multiprocessing/queues.py in get(self, block, timeout) 92 if block and timeout is None: 93 with self._rlock: ---> 94 res = self._recv_bytes() 95 self._sem.release() 96 else: ~/anaconda3/envs/step/lib/python3.6/multiprocessing/connection.py in recv_bytes(self, maxlength) 214 if maxlength is not None and maxlength < 0: 215 raise ValueError("negative maxlength") --> 216 buf = self._recv_bytes(maxlength) 217 if buf is None: 218 self._bad_message_length() ~/anaconda3/envs/step/lib/python3.6/multiprocessing/connection.py in _recv_bytes(self, maxsize) 405 406 def _recv_bytes(self, maxsize=None): --> 407 buf = self._recv(4) 408 size, = struct.unpack("!i", buf.getvalue()) 409 if maxsize is not None and size > maxsize: ~/anaconda3/envs/step/lib/python3.6/multiprocessing/connection.py in _recv(self, size, read) 377 remaining = size 378 while remaining > 0: --> 379 chunk = read(handle, remaining) 380 n = len(chunk) 381 if n == 0: ~/anaconda3/envs/step/lib/python3.6/site-packages/torch/utils/data/dataloader.py in handler(signum, frame) 272 # This following call uses `waitid` with WNOHANG from C side. Therefore, 273 # Python can still get and update the process status successfully. --> 274 _error_if_any_worker_fails() 275 if previous_handler is not None: 276 previous_handler(signum, frame) RuntimeError: DataLoader worker (pid 8362) is killed by signal: Illegal instruction.
jph00 commented 5 years ago

Please use the forum.