Closed AgaDob closed 3 years ago
@AgaDob I've updated the fix, try pulling new commits on main and check if it works 😄
Brilliant, thank you - varied batch-size now works for inference! However, my model still only outputs blank strings at inference - any ideas?
@AgaDob Did you test for greedy decoding only? Can you show me the image of training losses? Underfit model will result in the wrong outputs.
@usimarit I've trained a model on the train-clean-100 with word-pieces and I think you are right, it is underfit quite a bit - have you experienced this issue when training with word-pieces? I've also trained as a PoC on the test set, and with a beam search of 2 it outputs the character 't' for every single utterance - I'm just wondering whether it could be by any chance an issue with the code, as opposed to the models?
In fact, my test output looks very much like #105 where their model is outputting the letter 'i' for every utterance. However, I am saving my results to a new test file each time so that's not the issue...
PATH GROUNDTRUTH GREEDY BEAMSEARCH BEAMSEARCHLM
/home/usr/datasets/LibriSpeech/test-clean/7021/79730/7021-79730-0000.flac the three modes of management i
/home/usr/datasets/LibriSpeech/test-clean/7021/79730/7021-79730-0001.flac to suppose that the object of this work is to aid in effecting such a substitution as that is entirely to mistake its nature and design i
/home/usr/datasets/LibriSpeech/test-clean/7021/79730/7021-79730-0002.flac by reason and affection i
/home/usr/datasets/LibriSpeech/test-clean/7021/79730/7021-79730-0003.flac as the chaise drives away mary stands bewildered and perplexed on the door step her mind in a tumult of excitement in which hatred of the doctor distrust and suspicion of her mother disappointment vexation and ill humor surge and swell among those delicate organizations on which the structure and development of the soul so closely depend doing perhaps an irreparable injury i
@AgaDob I haven't had a chance to properly train and test the streaming transducer model, but anyway you can customize it your own ways to overcome the underfit, like for example, changing the optimizer, applying the learning rate schedule, etc.
You are right, I think its an issue with the model not converging. I've tried with a tiny architecture and testing scripts work perfectly. Thank you for the fantastic resource! :raised_hands:
On a different note: why is the validation loss lower than the training loss for these models?
@AgaDob maybe you apply specaugment as in the example config that causes the val loss lower than train loss.
Hi! I have cloned the latest version of the repo and I am running into 2 issues when using the example configs for the streaming transducer model: 1) the
test_streaming_transducer.py
script only runs with a batch-size of 1 - when using any other batch size, e.g. 3, I get:ValueError: Dimension 0 in both shapes must be equal, but are 3 and 1. Shapes are [3,1024] and [1,1024].
2) When train with a batch size of 1 on LibriSpeech, and runtest_streaming_transducer.py
, the model outputs only blank strings for all audios...Any ideas? I include the full trackback for issue 1) below: