KoljaB / RealtimeTTS

Converts text to speech in realtime
1.78k stars 159 forks source link

wave.Error: file does not start with RIFF id #20

Open philuxzhu opened 10 months ago

philuxzhu commented 10 months ago

I test the tests/chinese_test.py but there is an error. Does anyone know how to solve it?

Traceback: Traceback (most recent call last): File "/Users/zhujunming/Desktop/AIQQ/tts/RealtimeTTS/RealtimeTTS/text_to_stream.py", line 265, in synthesize_worker success = self.engine.synthesize(sentence) File "/Users/zhujunming/Desktop/AIQQ/tts/RealtimeTTS/RealtimeTTS/engines/system_engine.py", line 72, in synthesize with wave.open(self.file_path, 'rb') as wf: File "/usr/local/Cellar/python@3.10/3.10.13_1/Frameworks/Python.framework/Versions/3.10/lib/python3.10/wave.py", line 509, in open return Wave_read(f) File "/usr/local/Cellar/python@3.10/3.10.13_1/Frameworks/Python.framework/Versions/3.10/lib/python3.10/wave.py", line 163, in init self.initfp(f) File "/usr/local/Cellar/python@3.10/3.10.13_1/Frameworks/Python.framework/Versions/3.10/lib/python3.10/wave.py", line 130, in initfp raise Error('file does not start with RIFF id') wave.Error: file does not start with RIFF id

KoljaB commented 10 months ago

Maybe the sentence tokenizer messes up.

Let us verify this. To add logging so we can see the text which is sent to the engine:

import logging
logging.basicConfig(level=logging.DEBUG)    
engine = SystemEngine(level=logging.DEBUG)

And then in the play or play_async add this param:

stream.play(log_synthesized_text=True)

Now you should see log message like this in the CLI:

INFO:root:synthesizing: <TEXT>

Is the text displayed correctly there or does the text look messed up?

badbye commented 10 months ago

I have the same problem. I am using macOS Ventura.

Is the text displayed correctly there

Yes. @KoljaB

The problem is, that the format of the file generated by SystemEngine is AIFF instead of wav.

(base) ➜  ~ mediainfo system_speech_synthesis.wav
General
Complete name                            : system_speech_synthesis.wav
Format                                   : AIFF
Format/Info                              : Apple/SGI
File size                                : 53.8 KiB
Duration                                 : 1 s 157 ms
Overall bit rate mode                    : Constant
Overall bit rate                         : 381 kb/s
FileExtension_Invalid                    : aiff aifc aif

Audio
Format                                   : PCM
Format settings                          : Big / Signed
Codec ID                                 : twos
Duration                                 : 1 s 157 ms
Bit rate mode                            : Constant
Bit rate                                 : 352.8 kb/s
Channel(s)                               : 1 channel
Sampling rate                            : 22.05 kHz
Bit depth                                : 16 bits
Stream size                              : 49.8 KiB (93%)
badbye commented 10 months ago

https://github.com/nateshmbhat/pyttsx3/issues/142#issuecomment-1013533459

Confirmed. I have tested pyttsx3.save_to_file, it uses aiff format on MacOS and wav format on Linux.

KoljaB commented 10 months ago

Thank you very much @badbye for pointing this out.

Fix is now available in v0.3.34