earlephilhower / ESP8266Audio

Arduino library to play MOD, WAV, FLAC, MIDI, RTTTL, MP3, and AAC files on I2S DACs or with a software emulated delta-sigma DAC on the ESP8266 and ESP32
GNU General Public License v3.0
2.01k stars 432 forks source link

[Feature Request] - Managing Google TTS #475

Open schmurtzm opened 2 years ago

schmurtzm commented 2 years ago

Google TTS is a killer feature for ESP due to the multilingual and quality of voices. Following issue #395 : it seems not possible to use Google TTS url like a classical mp3 stream due to a buffering problem at the end of the play.

Easy way to reproduce the problem : take "StreamMP3FromHTTP" example, and replace : const char *URL="http://kvbstreams.dyndns.org:8000/wkvi-am"; with const char *URL="http://translate.google.com/translate_tts?ie=UTF-8&q=bonjour&tl=fr&client=tw-ob&ttsspeed=1";

This is a record of the sound that I obtain.

Managing Google TTS seems more complex than what I was thinking... I found this I2S library which support Google TTS: ESP32-audioI2S

As you can see here the author has made a complex function to use Google TTS. This could be a good source of inspiration to make a new AudioFileSourceGoogleTTS.h 😅

Interesting facts : I also tested to play Google TTS on this library without using this specific function, just with the URL of Google Translate and the result was similar to ESP8266Audio : at the end of the play there is a buffer problem. It doesn't hang the ESP on this library but it means that Google TTS send the mp3 file in a particular way which require more work than a classic mp3 stream.

FedericoBusero commented 2 years ago

Also remark following change in ESP32-audioI2S which might inspire to detect the end of the stream

https://github.com/schreibfaul1/ESP32-audioI2S/commit/ed13136ebe0b5da6d8377912596c6e0fce5ba96d#diff-6033949d768051c96b2a380edec84a3e1cbe58b3a94e41aae408c6a8f8bbc2b2

schmurtzm commented 2 years ago

After some investigation, I've found where is the problem : as you can see here in ESP32-audioI2S , he makes a special exception for TTS : "tts has one chunk only".

There are some issues about chunk management in ESP8266audio, most of them for ICY streams (=shoutcast). When we look at chunk management in ESP8266audio, we quickly find a pull request from yoav-klein which improves a lot the result with google TTS : now it hangs few seconds (about 11 seconds) instead of minutes !

From what I understand, there are some data in the mp3 stream which indicate the size of the file. I think that it is what is done here in ESP32-audioI2S library.

@yoav-klein & @DSangyy , I saw that you have working on the chunk management, if you have some time to take a look it will be very welcome ;) Thank you very much 😉