DeepSpeech 0.9.3 documentation and toolchain updates

kaoh commented 3 years ago

updated documentation download script added script for creating language model (scorer) script for exporting TFLite added fine-tuning support and updated transfer learning script KenLM and DeepSpeech as sub modules removed local scripts not used for speech processing added parameters to script files used DeepSpeech image without version

A lot has changed since DeepSpeech 0.5.0, I have tried to fix the scripts, make them flexible and bring the documentation in sync. The person who has generated the 0.9.0 version should be able to validate the new approach.

DeepSpeech and KenLM as sub modules

A potential issue is the use of a different version which was not validated against this project. By using git submodules this can be prevented and only checked versions are used. In case a new version is used these modules must be updated.

Training and new Scorer E.g. in 0.9.0 the trie is calculated completely differently and now a scorer is used, which does work with the old scripts because the parameters names have changed and the old ones are unsupported.
Transfer learning Also in, I think, version 0.6.0 of DeepSpeech transfer learning was introduced. The old scripts did not use this newer approach by dropping source layers but only continued with the training and were adjusting the alphabet. I have fixed this, the approach should be reconsidered if the values make sense, by default I'm dropping only the last output layer and also the number of epochs might be reconsidered.
Fine Tuning I have added a script for fine tuning which replaces the part "Transfer Learning" German to German, because this is just "Fine-Tuning". I use only 5 epochs, which must be qualified by real test if the loss is decreasing with more epochs.
Download Speech Corpus I streamlined the speech corpus download, audiomate was used anyway for voxforge, now audiomate is used for all downloads leaving just a single script for the whole work. I fixed 2 issue in audiomate, hence I'm using a patched version in my repo
Parameters? Are actually the learning rates and dropout rates the best values found by experimenting with this? The DeepSpeech examples use a rather low rate 0.0001
Unused scripts I also removed some local scripts referring local files and tests which are not used in the project, in case they are needed, please keep them locally. I also removed the idea project files, they should be kept locally by the developer, they are causing to much merge conflicts and the used IDE is a developer preference
Tests still to come Still to come are at least 2 tests checking the functional recognition of some samples against the full and TFLite model.

AASHISHAG commented 3 years ago

Thank you @kaoh so such detailed comments and updates. Allow me some time {two weeks} to run the steps.

Any other updates are always welcome!!

kaoh commented 3 years ago

Yes, sure. This reminds me of: I forgot one error I was facing to mention, which I have also documented during the training in the README :

NOTE: In case a Not enough time for target transition sequence (required: 171, available: 0) is thrown, the currently only known fix is to edit the file DeepSpeech/training/deepspeech_training/train.py and add , ignore_longer_outputs_than_inputs=True to the call to tfv1.nn.ctc_loss.

I assume this is related to some available training data but no corresponding wav file or an incorrect wav record length or the other way around. I was not able to trace down the error, because the Tensor class is not debugging friendly and does not reveal the vector values easily.

AASHISHAG commented 3 years ago

@kaoh : This is really awesome. I wasn't able to run all the steps (need more time). However, I will let the community to report any issues they encounter.

Merging the request.

kaoh commented 3 years ago

Thanks. I forgot to mention that I have stripped away the TODO section in the README. The TODOs were outdated and some other things have been added in the meanwhile (Mailabs, SWC) which have not been mentioned. If you want you can add a feature section again if you feel that something of this is noteworthy. For the TODOs in general I would use the GitHub issues.

AASHISHAG / deepspeech-german

DeepSpeech 0.9.3 documentation and toolchain updates #35