SortAnon / ControllableTalkNet

A web app that lets you play around with TalkNet models
GNU Affero General Public License v3.0
121 stars 48 forks source link

Debugging issues in `backward_extractor` #40

Open mmmmllll1 opened 1 year ago

mmmmllll1 commented 1 year ago

When trying to run # Extract phoneme duration step of TalkNet_Training_Offline notebook, I'm getting random errors in the backward_extractor function. See the output below;

[NeMo I 2023-05-29 13:30:11 features:252] PADDING: 1
[NeMo I 2023-05-29 13:30:11 features:262] STFT using conv
[NeMo I 2023-05-29 13:30:12 modelPT:439] Model EncDecCTCModel was successfully restored from /home/mmmmllll1/.cache/torch/NeMo/NeMo_1.0.2/qn5x5_libri_tts_phonemes/656c7439dd3a0d614978529371be498b/qn5x5_libri_tts_phonemes.nemo.
[NeMo I 2023-05-29 13:30:13 collections:173] Dataset loaded with 642 files totalling 0.67 hours
[NeMo I 2023-05-29 13:30:13 collections:174] 0 files were filtered totalling 0.00 hours
114/642 [00:48<02:57, 2.98it/s]
AssertionError                            Traceback (most recent call last)
Cell In[18], line 94
     91 target_tokens = preprocess_tokens(seq_ids, blank_id)
     93 f, p = forward_extractor(target_tokens, log_probs, blank_id)
---> 94 durs = backward_extractor(f, p)
     96 dur_key = Path(dl.dataset.collection[sample_idx].audio_file).stem
     97 dur_data[dur_key] = {
     98     'blanks': torch.tensor(durs[::2], dtype=torch.long).cpu().detach(), 
     99     'tokens': torch.tensor(durs[1::2], dtype=torch.long).cpu().detach()
    100 }

Cell In[18], line 45, in backward_extractor(f, p)
     43     t -= 1
     44 assert durs.shape[0] == n
---> 45 assert np.sum(durs) == m
     46 assert np.all(durs[1::2] > 0)
     47 return durs

IndexError                                Traceback (most recent call last)
Cell In[20], line 94
     91 target_tokens = preprocess_tokens(seq_ids, blank_id)
     93 f, p = forward_extractor(target_tokens, log_probs, blank_id)
---> 94 durs = backward_extractor(f, p)
     96 dur_key = Path(dl.dataset.collection[sample_idx].audio_file).stem
     97 dur_data[dur_key] = {
     98     'blanks': torch.tensor(durs[::2], dtype=torch.long).cpu().detach(), 
     99     'tokens': torch.tensor(durs[1::2], dtype=torch.long).cpu().detach()
    100 }

Cell In[20], line 41, in backward_extractor(f, p)
     39     s, t = n - 1, m
     40 while s > 0:
---> 41     durs[s - 1] += 1
     42     s -= p[s, t]
     43     t -= 1

IndexError: index 4720093899646973286 is out of bounds for axis 0 with size 49

I'm unsure how I should debug what is causing these issues? I assume there is something wrong with my training input?