Closed abdallah197 closed 4 years ago
Can you re-run the notebook with this line at the beginning?
os.environ['CUDA_LAUNCH_BLOCKING'] = '1'
It'll give a more informative error than the RuntimeError.
Also, does this happen on the first batch or after training for a few batches?
@bearpelican that's the error after i use os.environ['CUDA_LAUNCH_BLOCKING'] = '1'
---------------------------------------------------------------------------
RuntimeError Traceback (most recent call last)
<ipython-input-6-495233eaf2b4> in <module>
----> 1 learn.fit_one_cycle(4)
~/anaconda3/envs/musicautobot/lib/python3.7/site-packages/fastai/train.py in fit_one_cycle(learn, cyc_len, max_lr, moms, div_factor, pct_start, final_div, wd, callbacks, tot_epochs, start_epoch)
21 callbacks.append(OneCycleScheduler(learn, max_lr, moms=moms, div_factor=div_factor, pct_start=pct_start,
22 final_div=final_div, tot_epochs=tot_epochs, start_epoch=start_epoch))
---> 23 learn.fit(cyc_len, max_lr, wd=wd, callbacks=callbacks)
24
25 def fit_fc(learn:Learner, tot_epochs:int=1, lr:float=defaults.lr, moms:Tuple[float,float]=(0.95,0.85), start_pct:float=0.72,
~/anaconda3/envs/musicautobot/lib/python3.7/site-packages/fastai/basic_train.py in fit(self, epochs, lr, wd, callbacks)
198 else: self.opt.lr,self.opt.wd = lr,wd
199 callbacks = [cb(self) for cb in self.callback_fns + listify(defaults.extra_callback_fns)] + listify(callbacks)
--> 200 fit(epochs, self, metrics=self.metrics, callbacks=self.callbacks+callbacks)
201
202 def create_opt(self, lr:Floats, wd:Floats=0.)->None:
~/anaconda3/envs/musicautobot/lib/python3.7/site-packages/fastai/basic_train.py in fit(epochs, learn, callbacks, metrics)
99 for xb,yb in progress_bar(learn.data.train_dl, parent=pbar):
100 xb, yb = cb_handler.on_batch_begin(xb, yb)
--> 101 loss = loss_batch(learn.model, xb, yb, learn.loss_func, learn.opt, cb_handler)
102 if cb_handler.on_batch_end(loss): break
103
~/anaconda3/envs/musicautobot/lib/python3.7/site-packages/fastai/basic_train.py in loss_batch(model, xb, yb, loss_func, opt, cb_handler)
24 if not is_listy(xb): xb = [xb]
25 if not is_listy(yb): yb = [yb]
---> 26 out = model(*xb)
27 out = cb_handler.on_loss_begin(out)
28
~/anaconda3/envs/musicautobot/lib/python3.7/site-packages/torch/nn/modules/module.py in __call__(self, *input, **kwargs)
530 result = self._slow_forward(*input, **kwargs)
531 else:
--> 532 result = self.forward(*input, **kwargs)
533 for hook in self._forward_hooks.values():
534 hook_result = hook(self, input, result)
~/anaconda3/envs/musicautobot/lib/python3.7/site-packages/torch/nn/modules/container.py in forward(self, input)
98 def forward(self, input):
99 for module in self:
--> 100 input = module(input)
101 return input
102
~/anaconda3/envs/musicautobot/lib/python3.7/site-packages/torch/nn/modules/module.py in __call__(self, *input, **kwargs)
530 result = self._slow_forward(*input, **kwargs)
531 else:
--> 532 result = self.forward(*input, **kwargs)
533 for hook in self._forward_hooks.values():
534 hook_result = hook(self, input, result)
/GW/Health-Corpus/work/nn/musicautobot/musicautobot/music_transformer/model.py in forward(self, x)
29
30 bs,x_len = x.size()
---> 31 inp = self.drop_emb(self.encoder(x) + benc) #.mul_(self.d_model ** 0.5)
32 m_len = self.hidden[0].size(1) if hasattr(self, 'hidden') and len(self.hidden[0].size()) > 1 else 0
33 seq_len = m_len + x_len
RuntimeError: CUDA error: device-side assert triggered
@bearpelican A side note, the taining start normally an crashes eveytime in 20% of the first epoch
Looks like it's failing on the encoder. That usually happens when you have a token that is out of range of the vocab/embedding size. Are you training on custom data?
Try looping through your data to make sure tokens are within range
for i in data.train_ds:
assert i[0].data.max() < len(learn.data.vocab)
it returns assertion error. the data that was used was the lakh midi files dataset. what would be a suggusted fix in this situation? to clip the tokens that are larger than the embedding size? which is 312 when printed, or there's a way to extend the embedding size?
Be default, the tokens should be clipped by duration. So I'm not sure why you are getting an out of bounds error. Have you tried checking whether the data was encoded correctly?
data.trai_ds[idx][0].play()
. Where idx
is the index of the file that breaks the assertion error. If the playback doesn't sound right, then something must be off.
One way as you suggested is to increase the embedding length.
Currently the embedding length is calculated from the vocab length: model = get_language_model(arch, len(data.vocab.itos), config=config, drop_mult=drop_mult)
To handle longer note durations, you can increase the default DUR_SIZE, and the vocab will adjust accordingly.
Unfortunately these settings are hardcoded at the moment.
@bearpelican it seems some of the Midi files were had unusual embedding length, one solution that worked was to eliminate them. the other was to run the preprocessing notebook, although am not sure about the fixes that were done there
Hi I have run into an error trying to replicate the train.ipynb notebook for the music transformer I have installed the library using the instructions in the repo and tried to run the the notebook. the error: