desbma / GoogleSpeech

Read text using Google Translate TTS API
GNU Lesser General Public License v2.1
161 stars 37 forks source link

What are the different behaviors of speech.play( ) and speech.save( )? #19

Closed goldengrape closed 5 years ago

goldengrape commented 5 years ago

If a sentence has both Chinese and English, and the language is set to zh-cn, google translate tts api will read the English part in Chinese. This is very ugly.

So I tried to modify the function __next__ When the segment is English, it is read in English, and when the segment is Chinese, it is read in Chinese.

        for segment_num, segment in enumerate(segments):
            if __class__.is_EN(segment):
                    now_lang=self.switch_lang
            else:
                    now_lang=self.default_lang
            yield SpeechSegment(segment, now_lang, segment_num, len(segments))

This is very strange. When I use speech.play( ), Chinese and English can be read as expected. But when I use speeh.save(), the English part disappears.

I even tried

    def savef(self, file):
        """ Write audio data into a file object. """
        for segment in self:
            segment.play( )
            file.write(segment.getAudioData())

I can hear the English part, but I can't get English part in MP3 file.

As I understand, both play() and save() call class SpeechSegment.

speech.save( ) write segment.getAudioData() speech.play( ) call the SpeechSegment.play( ), which also call the segment.getAudioData() first. What the SpeechSegment.play( ) play is the segment.getAudioData().

I don't fully understand the difference between these two.

desbma commented 5 years ago

Can you provide an example of sentence with mixed language, to reproduce?

goldengrape commented 5 years ago

screenshot video: https://drive.google.com/open?id=1-F48IAUNvy7olUQ33llflcq6hyMeFI6c saved sound: https://drive.google.com/open?id=1iH3BUtilSnaSYnXg9nYVguICAgM0iRYR

I also found an interesting phenomenon. If you use QuickTime or PowerPoint to open this mp3 file, the English part is gone.

But if you open this audio file with VLC, the English part can still be heard.

I guess if there is some marker wrong in the mp3 file header when combining audio?

desbma commented 5 years ago

Yeah I also suspect the MP3 file concatenation is faulty.

Can you please paste here the sentence from your example, to reproduce?

goldengrape commented 5 years ago

here: https://github.com/goldengrape/dubbing-pptx/blob/gtts/google_tts.ipynb

I rewrite class speech: __init__, __next__, split_pattern and add a is_EN

I remove the part for CLI, and add a splitter for Chinese and English.

For save() part, I didn't touch.

Another interesting phenomenon: if a sentence is part A(Chinese)... part B (English)... part C(Chinese), save into mp3, would lead to part A(Chinese)...part C(Chinese).

desbma commented 5 years ago
$ google_speech -o a.mp3 -l en 'hey'
$ ffprobe -hide_banner a.mp3 
Input #0, mp3, from 'a.mp3':
  Duration: 00:00:00.77, start: 0.000000, bitrate: 32 kb/s
    Stream #0:0: Audio: mp3, 24000 Hz, mono, fltp, 32 kb/s
$ google_speech -o b.mp3 -l zh-cn '你好'
$ ffprobe -hide_banner b.mp3 
Input #0, mp3, from 'b.mp3':
  Duration: 00:00:00.94, start: 0.000000, bitrate: 32 kb/s
    Stream #0:0: Audio: mp3, 22050 Hz, mono, fltp, 32 kb/

So here is your explanation, the English file has a 24 KHz sampling rate, and the Chinese 22.05 KHz. Don't ask me why, I have no idea. The files have a different format so they cannot be joined simply by concatenating them.

As a workaround if you run google_speech with -l en and Chinese symbols, they are played fine as Chinese and they are 24 KHz.

Since this only happens if you join 2 files played with different languages by modifying the code, I am closing this issue.

goldengrape commented 5 years ago

@desbma Thank you! I will try to join them with ffmpeg

goldengrape commented 5 years ago

I solve the problem with Pydub. Thank you!