fastai / course-v3

The 3rd edition of course.fast.ai
https://course.fast.ai/
Apache License 2.0
4.91k stars 3.57k forks source link

problem in notebook 12c_ulmfit #572

Closed AFanaei closed 3 years ago

AFanaei commented 3 years ago

when running the 12c_ulmfit notebook for training a classifier i got the following error:

course-v3\nbs\dl2\exp\nb_09b.py in one_batch(self, i, xb, yb)
     41             if not self.in_train: return
---> 42             self.loss.backward();                           self('after_backward')
     43             self.opt.step();                                self('after_step')

course-v3\.env\lib\site-packages\torch\tensor.py in backward(self, gradient, retain_graph, create_graph)
    194         """
--> 195         torch.autograd.backward(self, gradient, retain_graph, create_graph)
    196 

course-v3\.env\lib\site-packages\torch\autograd\__init__.py in backward(tensors, grad_tensors, retain_graph, create_graph, grad_variables)
     92 
---> 93     grad_tensors = _make_grads(tensors, grad_tensors)
     94     if retain_graph is None:

course-v3\.env\lib\site-packages\torch\autograd\__init__.py in _make_grads(outputs, grads)
     34                     raise RuntimeError("grad can be implicitly created only for scalar outputs")
---> 35                 new_grads.append(torch.ones_like(out, memory_format=torch.preserve_format))
     36             else:

RuntimeError: CUDA error: device-side assert triggered

it will fail no matter using pretraining or not.

AFanaei commented 3 years ago

i managed to find the problem.
it was because the output size of pooling layer was not matching the number of categories in data set. (i was running the note book with my own dataset)
after changing the number of output the problem was fixed .

emb_sz, nh, nl = 100, 100, 1
dps = tensor([0.4, 0.3, 0.4, 0.05, 0.5]) * 0.25
tok_pad = vocab.index(PAD)
n_out = len(proc_cat.vocab)
model = get_text_classifier(len(vocab), emb_sz, nh, nl, n_out , tok_pad, bptt, *dps)