Request for pretrained german model

CobraCalle commented 4 years ago

Hello,

at first: thank you very much for your effort to provide this highly automated train script.

Sadly Im a real Windows guy and my Linux skills are very "limited". Ive installed Debian on Windows and tried to get this up and running, but it looks like singularity will not work an Windows on Linux :-( And setting up a bare metal Linux system with GPU access looks like a little overkill to me :-)

Would it be possible to provide a pre trained model for German language?

Thank you very much

Carl

NormanTUD commented 4 years ago

Hi Carl, thanks for your interest in this. Right now, I am affected by this virus -> https://www.datacenterdynamics.com/en/news/european-supercomputers-hacked-mine-cryptocurrency/ , so I cannot access the latest data. But I've uploaded the best model I had available before this time here:

https://github.com/NormanTUD/TrainDeepSpeechGerman/tree/master/model

Sadly, I cannot give you the checkpoints for further training, because, as of now, I still cannot access the super computer again.

When my university opens ssh connections again and I can access the supercomputer again, I will hopefully be able to upload better models and also the checkpoints.

You can use it with

./deepspeech --model 51.565295.output_graph.pb --audio audiofile.wav

(Remember to use DeepSpeech 0.7.x, as earlier versions are incompatible with this)

Have a nice day,

Norman

CobraCalle commented 4 years ago

WOW... thank you very much for the fast response. Io this compatible with DeepSpeech 0.7 or wich version do I have to use?

NormanTUD commented 4 years ago

You're very welcome and I hope this will help you. I'm using DS 0.7.0-alpha.3, but any 0.7.x version should be compatible with this (see https://github.com/mozilla/DeepSpeech/releases where it says "This is a bugfix release and retains compatibility with the 0.7.0 models" about 0.7.1). If you encounter any problems, feel free to ask me again. (I don't know much about Windows though, as I'm more of a "linux guy"). Have a nice day!

CobraCalle commented 4 years ago

I made a small test... and it "worked" but the result is not very good....

http://www.diomex.de/downloads/02ac6085-4a6d-4b4b-ba94-a56794ebd3af - Command.wav

For example when I process the attached file, the result is "tunehergar eintailen" (instead of "Verstärker einschalten"). Is that normal, or do I have to tune some kind of paramters or something like that (sorry, Im totally new to DeepSpeech. At the moment my solution uses Microsoft Azure Speech and Im looking for a way to do the Speech-To-Text-part offline)

CobraCalle commented 4 years ago

Are you shure, the model is for german language? If I use an audio with just the word "Computer", the result is "omputer"... ok... the missing first letter could be related to audio quality... but it looks to me as the model is for another language....

CobraCalle commented 4 years ago

Here is the console output of my test (the last line is repeated many times): TensorFlow: v1.15.0-24-gceb46aae58 DeepSpeech: v0.7.1-0-g2e9c281d Warning: reading entire model file into memory. Transform model file into an mmapped graph to reduce heap usage. 2020-05-25 19:04:23.779597: I tensorflow/core/platform/cpu_feature_guard.cc:142] Your CPU supports instructions that this TensorFlow binary was not compiled to use: AVX2 2020-05-25 19:04:24.227352: W tensorflow/core/kernels/rnn/lstm_ops.cc:864] BlockLSTMOp is inefficient when both batch_size and input_size are odd. You are using: batch_size=1, input_size=655 2020-05-25 19:04:24.227462: W tensorflow/core/kernels/rnn/lstm_ops.cc:869] BlockLSTMOp is inefficient when both batch_size and cell_size are odd. You are using: batch_size=1, cell_size=655

NormanTUD commented 4 years ago

Yes, this is a german model (based on german language and a german trie), but it certainly needs more training (sorry, I should have mentioned this before). The loss is still quite high and sometimes the results are bad (though I've had some pretty good results already, too, depending on what was said). Please wait until I can access the supercomputer again. Shortly before it was shut down due to the virus, I instructed it to run several thousand more parameter configurations in search for a better model. I will notify you here as soon as I got the new models, but I cannot say how long this will take yet. Sorry for the inconveniences!

CobraCalle commented 4 years ago

Thank you very much... That sounds really great 👍

CobraCalle commented 4 years ago

OK... I think I`ve learnd somthing :-) I have to use a scorer to get good results :-) But if I use the scorer provided in the deepspeech repo the results are better, but not good... I think that is related to the fact that the scorer is trained for english... do you have scorer for german too?

NormanTUD commented 4 years ago

Thanks for this info, I've stumbled upon this too while learning. I've used this scorer model here -> https://www.kaggle.com/mischi001/germanlmkenlm This is based on lots of german texts afaik. If this helps you to get better results, please report it, as I'd been very interested in this too.

CobraCalle commented 4 years ago

Thanks a lot... the results are slightly better now... but sadly not good enough for real world usage. so perhaps I´ll have to wait for stronger trained version.

can you explain to me what the file "de_alphabet.txt" is? is that needed?

NormanTUD commented 4 years ago

Hey CobraCalle, thanks very much for this idea. I never thought of adding the parameter --scorer to the recognizer-command (only to the learning command). On my local computer from my cheap microphone, the results are waaaay better now (first try: "ich liebe meine katze"). But yes, we'll still need to wait for the supercomputer to get back up. As said, as soon as I've got access again and it is finished training better models, I will inform you here.

The de_alphabet.txt is needed for training mainly. As far as I understand DS, it kind of works like this:

You an audio file that gets "chopped up" in several smaller parts and then, a 6-layered neural network tries to map each part to a letter from the alphabet.txt (inlcuding äöß for german language), which then gets mapped through the scorer file (which contains trigrams from many texts, e.g. for the word "Kapitän" it's "Kap", "api", "pit", "än" and the likelyhood of one trigram following another) and finds the most likely combination that the sounds in the audio file may have been spoken.

But to be honest: I do not have an extremely deep understanding of DS, but just some basic level stuff.

I hope this is helpful. As I've said: as soon as I've got better models, I will post them immidiately.

CobraCalle commented 4 years ago

That sounds great!

For your information: I`m working on a voice controlled home automation system, that is highly customizable, secure and can be integrated to every room in a house... so the result is an experience like the board computer on star trek TNG. Works pretty great and can be integrated with any home automation hardware / system (and supports tons of features Alexa & Co. do not have at the moment including presence detection through face detection etc.).

At the moment I use Microsoft AZURE for speech to text because it delivers by far the best recognition result... even unter noisy conditions. But I`m still looking for a real offline solution (for privacy and performance concerns).

NormanTUD commented 4 years ago

Cool project, I wanted to work on something very similiar to what you're describing. That's why I got interested in DeepSpeech.

Are you planning on releasing this? I'd really appreciate that.

CobraCalle commented 4 years ago

Yes I will... at the moment I`m running a closed beta, but if you like can join.

Have a look at this (Ive posted it at IP Symcon because thats the home automation system Im unsing at home, but the plug in model allows the inteagration of everything that has an API :-))...

https://www.symcon.de/forum/threads/43347-Custom-Sprachsteuerung-Board-Computer

CobraCalle commented 4 years ago

Are you still there?

NormanTUD commented 4 years ago

Yes, I'm sorry for the delay. I wrote a little test with the current state of DeepSpeech for Speech-to-Text. You can see this test here -> https://www.youtube.com/watch?v=MGxneP4vVFQ&feature=youtu.be (It's not perfect, but already better than nothing at all).

The home automation system sounds really interesting, but I had not yet had time to look into it more deeply. I will probably do this on the weekend.

Thanks!

CobraCalle commented 4 years ago

That's looking really great. That's the same model you have provided to me, with the same scorer? I think the significant better accuracy in you sample is based on the better input. My test where made with recorded commands from productive use of my voice control system (user stays meters away... Noisy environment... Etc.) The voice control system does normalisation and noice cancelling before feeding the audio into the speech recognizer... I'll implement a voice recognizer based on deepspee h and your model and will do some real world testing in the next days...

NormanTUD commented 4 years ago

To be honest, I'm very much excited of how well this actually works for such an early (and very stupidly designed) prototype. https://www.youtube.com/watch?v=mm0DwkwonRo&feature=youtu.be Really looking forward to getting the new models from the super computer.

NormanTUD commented 4 years ago

Yes, this is the same model and the same scorer.

I'm glad that you might find this useful. As I've said, I'm really looking forward to getting the new models :-)

NormanTUD / TrainDeepSpeechGerman

Request for pretrained german model #1