desbma / GoogleSpeech

Read text using Google Translate TTS API
GNU Lesser General Public License v2.1
160 stars 37 forks source link

pause every ~100 characters... #17

Closed chrisclarkson closed 5 years ago

chrisclarkson commented 5 years ago

Hi when I use your software to turn sentences into mp3 files. There is a pause after every 98th character:

speech = Speech('The atomic structure of the nucleosome has been revealed by X-ray crystallography, delineating how this is important','en')
speech.play()
speech.save('out.mp3')

I get an unnatural sounding pause after '..delineating how'. I would prefer for it to go through the full sentence.... Is this possible? Sorry for the naive question.... Thanks in advance.

desbma commented 5 years ago

Unfortunately the Google API has a limit to the length of the string it receives, so google_speech splits the sentence at the last word before reaching 100 characters.

If you want a more advanced behavior, the gTTS project has some logic to configure how sentences are split.

goldengrape commented 5 years ago

If the pause is in the punctuation position, it won't be very abrupt. So you can split the string into substrings of no more than 100 characters, and the end is exactly punctuation.

I used this little trick in another online Chinese TTS engine.

def cut_string(text, Lmax):
    text=text.replace("\n","")+"\n"
    #any characters >0 and <Lmax, and end with ",. space chinese,. return" but not .or, followed by number
    punc="([\s\S]{0," +str(Lmax)+ "}[,. ,。\n|(?!.\d+)|(?!,\d+)])"
    return iter(re.findall(punc,text))
goldengrape commented 5 years ago

How about this:

@staticmethod
def splitText(text):
    useless_chars = frozenset(
                          string.punctuation 
                          + string.whitespace
                          + "!,。?、~@#¥%……&*():;《)《》“”()»〔〕-" #this line is Chinese punctuation
                          )
    punc="([\s\S]{2," +str(__class__.MAX_SEGMENT_SIZE)+ "}[useless_chars|(?!.\d+)|(?!,\d+)])"
    segments = iter(re.findall(punc,text)) # you can remove iter(), if you don't like
    return segments 
desbma commented 5 years ago

I'm having a hard time understanding what your regex does.

How does it handle strings like A.B.C.D?

goldengrape commented 5 years ago

For A.B.C.D, sadly, it will split to A. / B. / C. / D.

However, {2," +str(__class__.MAX_SEGMENT_SIZE)+ "} makes the limitation. 2, or you can change to something like class.MIN_SEGMENT_SIZE to limit it.

Since the value of MAX is set very large, such as 100 or 98, it is not easy to have problems. The value of MIN can also be set larger, for example, it can be set above 50, so that the problem of "A.B.C.D." can be avoided.

and, there is a little bug. For a string to be split, there must be punctuation at the end, otherwise, a substring will be lost.

chrisclarkson commented 5 years ago

@goldengrape Hi thanks for your input. I tried both of the solutions that you gave. I actually have a much longer piece of text than the example I gave (a fulll article).

So I tried: I tried your cut string method

f=open('long_text.txt','r')
text=f.read()
cut=cut_string(text,98)
for i in cut:
     speech = Speech(i,'en')
     speech.play()

The pauses are still quite irregular and unnatural sounding...

As for your second suggestion: I run it with '@staticmethod' and then call the function:

splitText(text)

TypeError: 'staticmethod' object is not callable

I then tried running it without the '@staticmethod' and got:

splitText(text)
NameError: name 'string' is not defined

Any suggestions would be appreciated.

@desbma Yes I have tried gTTS in the past but the pauses sound much worse there... I would open an issue on that github page to enquire about more high level use of gTTS but it seems swamped with questions....

goldengrape commented 5 years ago

@chrisclarkson Can you give a recording example? What kind of irregular and unnatural sounding is it?

I also think of a possible interesting solution, maybe you can make a certain overlap between the two audio. Cut off the unnatural sound at the end of the paragraph and the beginning of the next paragraph, and then merge them together.

However, this may involve more complicated code, perhaps identifying the location of the last pause in the audio.

desbma commented 5 years ago

@chrisclarkson @goldengrape I have made some improvements to the text splitting logic. It now tries to split at a punctuation character, before trying to split on whitespace, so it should feel more natural in some cases. It also detects punctuation more accurately, including from non english languages.

No splitting method is perfect for every case, however I think this a generic behavior that should well most of the time.

desbma commented 5 years ago

I'm closing this, since the example of sentence from first post is fixed.