Romkabouter / ESP32-Rhasspy-Satellite

The repo has implementing an esp32 standalone MQTT audio streamer. Is is desinged to work as a satellite for Rhasspy (https://rhasspy.readthedocs.io/en/latest/). It supports multiple devices
GNU General Public License v3.0
359 stars 64 forks source link

All Audio is just hissing and crackling. #53

Closed whinis closed 3 years ago

whinis commented 3 years ago

I have a Matrix Voice ESP32 and everything I attempt to play over mqtt just comes out of hisses and crackling or is extremely quiet. I have tried with the Rhasspy directly and even the beeps don't come through just a barely audible hiss

One of my responses from Rhasspy is reading from a list and some words are audible but quite and others are just hissing

Then I tried playing audio using python streaming to mqtt however all I got is crackling.

Changing volume or using headphones seem to have no effect.

Romkabouter commented 3 years ago

What is the sample rate of your audio files?

whinis commented 3 years ago

I tried 44.1k, 16k, and 8k

Romkabouter commented 3 years ago

Hmm, everything below 44.1 should be ok. I cannot reproduce this sadly so I have no clue on the issue.

whinis commented 3 years ago

I am wondering if I somehow got a matrix with a bad amp ?

ayavilevich commented 3 years ago

@whinis hey, see if there is correlation between length of audio and problems.

You can try this to dump the stream from mqtt to a file to do an offline check. Maybe it will help to pin-point the issue. https://github.com/ayavilevich/rhasspy-helper/blob/main/logAudio.js

Dimitar-Boychev commented 3 years ago

Hello, I have the same issue on a Matrix Voice ESP32... I made it working for a few tests with different voices in Rhasspy and now I am back to hissing and cracking. I am looking into it too... The wake word works.

Dimitar-Boychev commented 3 years ago

OK, I have no idea but I was playing around with the settings on the web interface (changing the audio output,mute input, mute output, volume, etc...), but now I got the incoming audio ( matrix voice -> rhasspy ) to work ok. The Text to Speech via the "speak" button in rhasspy's web interface plays fine via the headphone jack. The responses from home assistant (POST to :12101/api/text-to-speech) are fine. BUT the sound that rhasspy makes when you say the wake word is hissing.......

There is something strange going with the volume ... at one time I made it to start low and progressively increase while speaking, then I restarted the Matrix Voice and now it's back to normal, but rhasspy's wake sound is still hissing...

whinis commented 3 years ago

Mine also happened after changing the volume as it seemed low but I have been unable to get it recovered

Romkabouter commented 3 years ago

Good pointers! I never change the volume so it might be an issue indeed. I could not reproduce, but have some new leads now :)

Could you try and reflash the Maxtrix Voice?

You need to reflash, but the volume setting is written to a memeory address I will also try to see if I can reproduce when adjusting the volume

ayavilevich commented 3 years ago

May I also suggest to post here the log that comes from the serial of the esp32. Just remove any duplicate lines to keep it readable. This will show errors, if any, and details about the audio streams that are being played.

Dimitar-Boychev commented 3 years ago

Hmm I didn't erase ... ok I will try and get back to you with more info. @ayavilevich maybe pins 8 and 10 on the Raspberry Pi GPIO header side can get me the serial out log.... https://matrix-io.github.io/matrix-documentation/matrix-voice/resources/pinout/ Do you know an easier way (something in the matrix creator software maybe ? I look at it for the first time now...) or the usual usb to serial with voltage level converter to 3.3 as all esp chips use to be programmed ?

I am shooting in the blind as I didn't look at the source, but are you using the FPGA for something? I see that the deploy.sh is flashing the ESP32 chip only...

Romkabouter commented 3 years ago

@Dimitar-Boychev if you attach the matrix voice to the same Pi as you installed it, you can use minicom sudo apt-get install minicom sudo minicom

config the serial to /dev/ttyS0 and llogs will be printed :)

Dimitar-Boychev commented 3 years ago

Initial logs with the firmware from yesterday:

Samplerate: 44100, Channels: 2, Format: 1, Bits per Sample: 16 Buffer underflow !!! TONS of Buffer underflow !!! Enter HotwordDetected Buffer underflow Done Samplerate: 44100, Channels: 2, Format: 1, Bits per Sample: 16 Done Enter Idle

Reset via : sudo voice_esp32_enable and esptool.py --chip esp32 --port /dev/ttyS0 --baud 115200 --before default_reset --after hard_reset erase_flash Reflash via ./deploy.sh from git bash Rhasspy wakeup sounds hissing TTS from Rhaspy web interface hissing too

Samplerate: 44100, Channels: 2, Format: 1, Bits per Sample: 16 Buffer underflow Buffer underflow Buffer underflow Buffer underflow Buffer underflow Buffer underflow Buffer underflow Buffer underflow Buffer underflow Buffer underflow Buffer underflow Buffer underflow Done Enter HotwordDetected Samplerate: 44100, Channels: 2, Format: 1, Bits per Sample: 16 <-- WAKEUP WORLD Done Enter Idle Samplerate: 22050, Channels: 1, Format: 1, Bits per Sample: 16 <-- TTS FROM WEB INTERFACE Done

Changing GAIN to 4 from web interface -> does not fix TTS Changing volume to 75 from web interface -> ESP restart because of panic

Guru Meditation Error: Core 0 panic'ed (LoadProhibited). Exception was unhandl. Core 0 register dump: PC : 0x4017d4f3 PS : 0x00060230 A0 : 0x8017d6c3 A1 : 0x3 A2 : 0x00000000 A3 : 0x3fff58b8 A4 : 0x00000000 A5 : 0x3 A6 : 0x00000014 A7 : 0x00000000 A8 : 0x80087a60 A9 : 0x3 A10 : 0x3fff6634 A11 : 0x0000166f A12 : 0x68514260 A13 : 0x0 A14 : 0x00060a23 A15 : 0x00000000 SAR : 0x00000019 EXCCAUSE: 0x0 EXCVADDR: 0x00000000 LBEG : 0x4000c2e0 LEND : 0x4000c2f6 LCOUNT : 0xf

ELF file SHA256: 0000000000000000

Backtrace: 0x4017d4f3:0x3ffdd620 0x4017d6c0:0x3ffdd640 0x4017d702:0x3ffdd660 0x0

Rebooting... I (10) boot: ESP-IDF v3.1 2nd stage bootloader I (10) boot: compile time 11:08:24 I (11) boot: Enabling RNG early entropy source... I (14) boot: SPI Speed : 40MHz I (18) boot: SPI Mode : DIO I (22) boot: SPI Flash Size : 4MB I (26) boot: Partition Table: I (29) boot: ## Label Usage Type ST Offset Length I (37) boot: 0 nvs WiFi data 01 02 00009000 00005000 I (44) boot: 1 otadata OTA data 01 00 0000e000 00002000 I (52) boot: 2 app0 OTA app 00 10 00010000 001e0000 I (59) boot: 3 app1 OTA app 00 11 001f0000 001e0000 I (67) boot: 4 eeprom Unknown data 01 99 003f0000 00001000 I (74) boot: 5 spiffs Unknown data 01 82 003f1000 0000f000 I (82) boot: End of partition table I (86) boot: No factory image, trying OTA 0 I (91) esp_image: segment 0: paddr=0x00010020 vaddr=0x3f400020 size=0x402c4 (26p I (192) esp_image: segment 1: paddr=0x000502ec vaddr=0x3ffbdb60 size=0x03ea8 ( d I (198) esp_image: segment 2: paddr=0x0005419c vaddr=0x40080000 size=0x00400 ( d I (199) esp_image: segment 3: paddr=0x000545a4 vaddr=0x40080400 size=0x0ba6c ( d I (227) esp_image: segment 4: paddr=0x00060018 vaddr=0x400d0018 size=0xc6c00 (8p I (512) esp_image: segment 5: paddr=0x00126c20 vaddr=0x4008be6c size=0x04978 ( d I (531) boot: Loaded app from partition at offset 0x10000 I (531) boot: Disabling RNG early entropy source... Booting Matrix Voice Initialized Loading configuration { "mqtt_host": "XXX.XXX.XXX.XXX", "mqtt_port": 1883, "mqtt_user": "username", "mqtt_pass": "password", "mute_input": false, "mute_output": false, "amp_output": 1, "brightness": 15, "hotword_brightness": 15, "hotword_detection": 1, "volume": 75, "gain": 4 } Creating I2Stask Enter WifiDisconnected Total heap: 272168 Free heap: 188544 Enter WifiConnected

TTS still broken:

Enter Idle Samplerate: 22050, Channels: 1, Format: 1, Bits per Sample: 16 Done

Output to headphone -> speakers

E (137116) task_wdt: Task watchdog got triggered. The following tasks did not r: E (137116) task_wdt: - IDLE0 (CPU 0) E (137116) task_wdt: Tasks currently running: E (137116) task_wdt: CPU 0: I2Stask E (137116) task_wdt: CPU 1: IDLE1 E (137116) task_wdt: Aborting. abort() was called at PC 0x401545e0 on core 0

ELF file SHA256: 0000000000000000

Backtrace: 0x40089948:0x3ffbfb00 0x40089bc5:0x3ffbfb20 0x401545e0:0x3ffbfb40 0x0

Rebooting... I (10) boot: ESP-IDF v3.1 2nd stage bootloader I (10) boot: compile time 11:08:24 I (11) boot: Enabling RNG early entropy source... I (14) boot: SPI Speed : 40MHz I (18) boot: SPI Mode : DIO I (22) boot: SPI Flash Size : 4MB I (26) boot: Partition Table: I (29) boot: ## Label Usage Type ST Offset Length I (37) boot: 0 nvs WiFi data 01 02 00009000 00005000 I (44) boot: 1 otadata OTA data 01 00 0000e000 00002000 I (52) boot: 2 app0 OTA app 00 10 00010000 001e0000 I (59) boot: 3 app1 OTA app 00 11 001f0000 001e0000 I (67) boot: 4 eeprom Unknown data 01 99 003f0000 00001000 I (74) boot: 5 spiffs Unknown data 01 82 003f1000 0000f000 I (82) boot: End of partition table I (86) boot: No factory image, trying OTA 0 I (91) esp_image: segment 0: paddr=0x00010020 vaddr=0x3f400020 size=0x402c4 (26p I (192) esp_image: segment 1: paddr=0x000502ec vaddr=0x3ffbdb60 size=0x03ea8 ( d I (198) esp_image: segment 2: paddr=0x0005419c vaddr=0x40080000 size=0x00400 ( d I (199) esp_image: segment 3: paddr=0x000545a4 vaddr=0x40080400 size=0x0ba6c ( d I (227) esp_image: segment 4: paddr=0x00060018 vaddr=0x400d0018 size=0xc6c00 (8p I (512) esp_image: segment 5: paddr=0x00126c20 vaddr=0x4008be6c size=0x04978 ( d I (531) boot: Loaded app from partition at offset 0x10000 I (531) boot: Disabling RNG early entropy source... Booting Matrix Voice Initialized Loading configuration { "mqtt_host": "XXX.XXX.XXX.XXX", "mqtt_port": 1883, "mqtt_user": "username", "mqtt_pass": "password", "mute_input": false, "mute_output": false, "amp_output": 0, "brightness": 15, "hotword_brightness": 15, "hotword_detection": 1, "volume": 75, "gain": 4 } Creating I2Stask Enter WifiDisconnected Total heap: 272184 Free heap: 188688 Enter WifiConnected Connected to Wifi with IP: YYY.YYY.YYY.YYY, SSID: WIFI_SSID, BSSID: AA:AA:AA:AA:AA Connecting MQTT: XXX.XXX.XXX.XXX, 1883 Enter MQTTConnected Connected as satellite Enter Idle

Output to speakers -> headphones (no restart this time), TTS hissing Hotword brightness: 15 -> 40 (restart low below) TTS hissing

Parameter mqtt_host, value XXX.XXX.XXX.XXX Parameter mqtt_port, value 1883 Parameter mqtt_user, value username Parameter mqtt_pass, value password Parameter amp_output, value 1 Parameter volume, value 75 Parameter brightness, value 15 Parameter hw_brightness, value 40 Hotword brightness changed Parameter hotword_detection, value 1 Parameter gain, value 4 Settings changed, saving configuration Saving configuration { "mqtt_host": "XXX.XXX.XXX.XXX", "mqtt_port": 1883, "mqtt_user": "username", "mqtt_pass": "password", "mute_input": false, "mute_output": false, "amp_output": 1, "brightness": 15, "hotword_brightness": 40, "hotword_detection": 1, "volume": 75, "gain": 4 } Enter MQTTDisconnected Connect failed, retry Audio connected: 0, Async connected: 1 Enter MQTTDisconnected Connecting MQTT: XXX.XXX.XXX.XXX, 1883 Connecting MQTT: XXX.XXX.XXX.XXX, 1883 E (1659: E (165971) task_wdt: - IDLE0 (CPU 0) E (165971) task_wdt: Tasks currently running: E (165971) task_wdt: CPU 0: I2Stask E (165971) task_wdt: CPU 1: loopTask E (165971) task_wdt: Aborting. abort() was called at PC 0x401545e0 on core 0

ELF file SHA256: 0000000000000000

Backtrace: 0x40089948:0x3ffbfb00 0x40089bc5:0x3ffbfb20 0x401545e0:0x3ffbfb40 0x0

Rebooting... I (10) boot: ESP-IDF v3.1 2nd stage bootloader I (10) boot: compile time 11:08:24 I (11) boot: Enabling RNG early entropy source... I (14) boot: SPI Speed : 40MHz I (18) boot: SPI Mode : DIO I (22) boot: SPI Flash Size : 4MB I (26) boot: Partition Table: I (29) boot: ## Label Usage Type ST Offset Length I (37) boot: 0 nvs WiFi data 01 02 00009000 00005000 I (44) boot: 1 otadata OTA data 01 00 0000e000 00002000 I (52) boot: 2 app0 OTA app 00 10 00010000 001e0000 I (59) boot: 3 app1 OTA app 00 11 001f0000 001e0000 I (67) boot: 4 eeprom Unknown data 01 99 003f0000 00001000 I (74) boot: 5 spiffs Unknown data 01 82 003f1000 0000f000 I (82) boot: End of partition table I (86) boot: No factory image, trying OTA 0 I (91) esp_image: segment 0: paddr=0x00010020 vaddr=0x3f400020 size=0x402c4 (26p I (192) esp_image: segment 1: paddr=0x000502ec vaddr=0x3ffbdb60 size=0x03ea8 ( d I (198) esp_image: segment 2: paddr=0x0005419c vaddr=0x40080000 size=0x00400 ( d I (199) esp_image: segment 3: paddr=0x000545a4 vaddr=0x40080400 size=0x0ba6c ( d I (227) esp_image: segment 4: paddr=0x00060018 vaddr=0x400d0018 size=0xc6c00 (8p I (512) esp_image: segment 5: paddr=0x00126c20 vaddr=0x4008be6c size=0x04978 ( d I (531) boot: Loaded app from partition at offset 0x10000 I (531) boot: Disabling RNG early entropy source... Booting Matrix Voice Initialized Loading configuration { "mqtt_host": "XXX.XXX.XXX.XXX", "mqtt_port": 1883, "mqtt_user": "username", "mqtt_pass": "password", "mute_input": false, "mute_output": false, "amp_output": 1, "brightness": 15, "hotword_brightness": 40, "hotword_detection": 1, "volume": 75, "gain": 4 } Creating I2Stask Enter WifiDisconnected Total heap: 272184 Free heap: 188688 Enter WifiConnected Connected to Wifi with IP: YYY.YYY.YYY.YYY, SSID: WIFI_SSID, BSSID: AA:AA:AA:AA:A Connecting MQTT: XXX.XXX.XXX.XXX, 1883 Enter MQTTConnected Connected as satellite Enter Idle

Manually woke up Rhasspy via button in web interface -> Play recording is hissing -> download as WAV and playing on the PC is perfectly fine.

Volume moved to 100:

Parameter mqtt_host, value XXX.XXX.XXX.XXX Parameter mqtt_port, value 1883 Parameter mqtt_user, value username Parameter mqtt_pass, value password Parameter amp_output, value 1 Parameter volume, value 100 Volume changed Parameter brightness, value 15 Parameter hw_brightness, value 40 Parameter hotword_detection, value 1 Parameter gain, value 4 Settings changed, saving configuration Saving configuration { "mqtt_host": "XXX.XXX.XXX.XXX", "mqtt_port": 1883, "mqtt_user": "username", "mqtt_pass": "password", "mute_input": false, "mute_output": false, "amp_output": 1, "brightness": 15, "hotword_brightness": 40, "hotword_detection": 1, "volume": 100, "gain": 4 } Enter MQTTDisconnected Connect failed, retry Audio connected: 0, Async connected: 1 Enter MQTTDisconnected Connecting MQTT: XXX.XXX.XXX.XXX, 1883 Connecting MQTT: XXX.XXX.XXX.XXX, 1883 Enter Md Connected as satellite [E][AsyncTCP.cpp:885] _lwip_fin(): 0x3fff4834 != 0x3fff49d8 Enter Idle [E][AsyncTCP.cpp:953] _poll(): 0x3fff4834 != 0x3fff49d8 [E][AsyncTCP.cpp:953] _poll(): 0x3fff4834 != 0x3fff49d8 [E][AsyncTCP.cpp:953] _poll(): 0x3fff4834 != 0x3fff49d8 [E][AsyncTCP.cpp:953] _poll(): 0x3fff4834 != 0x3fff49d8

At this point[E][AsyncTCP.cpp:953] _poll(): 0x3fff4834 != 0x3fff49d8 was repeating once every 0.5 seconds

Power cycle matrix voice ( removing it from the header and getting it back in) TTS -> no problems works flawlessly now

Connected as satellite Enter Idle Samplerate: 22050, Channels: 1, Format: 1, Bits per Sample: 16 Done Using the wake word same thing with the hissing Samplerate: 44100, Channels: 2, Format: 1, Bits per Sample: 16 Buffer underflow Enter HotwordDetected Buffer underflow Done Samplerate: 44100, Channels: 2, Format: 1, Bits per Sample: 16 Done Enter Idle Samplerate: 44100, Channels: 2, Format: 1, Bits per Sample: 16 Done

At this point I cleared again the flash and wrote the same firmware to test... If I don't touch anything TTS worked ok, Rhasspy wakeup sounds are broken. gain to 5 -> TTS OK Volume change-> crash, but after the reboot TTS is still OK

[E][AsyncTCP.cpp:953] _poll(): 0x3fff5524 != 0x3fff5968 [E][WiFiClient.cpp:463] available(): fail on fd -1, errno: 11, "No more process" [E][AsyncTCP.cpp:953] _poll(): 0x3fff5524 != 0x3fff5968 [E][AsyncTCP.cpp:953] _poll(): 0x3fff5524 != 0x3fff5968 [E][AsyncTCP.cpp:953] _poll(): 0x3fff5524 != 0x3fff5968 [E][AsyncTCP.cpp:953] _poll(): 0x3fff5524 != 0x3fff5968 [E][AsyncTCP.cpp:953] _poll(): 0x3fff5524 != 0x3fff5968 [E][AsyncTCP.cpp:953] _poll(): 0x3fff5524 != 0x3fff5968 [E][AsyncTCP.cpp:953] _poll(): 0x3fff5524 != 0x3fff5968 [E][AsyncTCP.cpp:953] _poll(): 0x3fff5524 != 0x3fff5968 [E][AsyncTCP.cpp:953] _poll(): 0x3fff5524 != 0x3fff5968 E (154106) task_wdt: Task watchdog got triggered. The following tasks did not r: E (154106) task_wdt: - IDLE0 (CPU 0) E (154106) task_wdt: Tasks currently running: E (154106) task_wdt: CPU 0: I2Stask E (154106) task_wdt: CPU 1: IDLE1 E (154106) task_wdt: Aborting. abort() was called at PC 0x401545e0 on core 0

ELF file SHA256: 0000000000000000

Backtrace: 0x40089948:0x3ffbfb00 0x40089bc5:0x3ffbfb20 0x401545e0:0x3ffbfb40 0x0

Rebooting... I (10) boot: ESP-IDF v3.1 2nd stage bootloader I (10) boot: compile time 11:08:24 I (11) boot: Enabling RNG early entropy source... I (14) boot: SPI Speed : 40MHz I (18) boot: SPI Mode : DIO I (22) boot: SPI Flash Size : 4MB I (26) boot: Partition Table: I (29) boot: ## Label Usage Type ST Offset Length I (37) boot: 0 nvs WiFi data 01 02 00009000 00005000 I (44) boot: 1 otadata OTA data 01 00 0000e000 00002000 I (52) boot: 2 app0 OTA app 00 10 00010000 001e0000 I (59) boot: 3 app1 OTA app 00 11 001f0000 001e0000 I (67) boot: 4 eeprom Unknown data 01 99 003f0000 00001000 I (74) boot: 5 spiffs Unknown data 01 82 003f1000 0000f000 I (82) boot: End of partition table I (86) boot: No factory image, trying OTA 0 I (91) esp_image: segment 0: paddr=0x00010020 vaddr=0x3f400020 size=0x402c4 (26p I (192) esp_image: segment 1: paddr=0x000502ec vaddr=0x3ffbdb60 size=0x03ea8 ( d I (198) esp_image: segment 2: paddr=0x0005419c vaddr=0x40080000 size=0x00400 ( d I (199) esp_image: segment 3: paddr=0x000545a4 vaddr=0x40080400 size=0x0ba6c ( d I (227) esp_image: segment 4: paddr=0x00060018 vaddr=0x400d0018 size=0xc6c00 (8p I (512) esp_image: segment 5: paddr=0x00126c20 vaddr=0x4008be6c size=0x04978 ( d I (531) boot: Loaded app from partition at offset 0x10000 I (531) boot: Disabling RNG early entropy source... Booting Matrix Voice Initialized Loading configuration

Romkabouter commented 3 years ago

Samplerate: 44100, Channels: 2, Format: 1, Bits per Sample: 16

I see the samplerate of 44100, this is know to cause hissing.

As far as I can tell now, it might be related to a combination of volume and gain. This is because with a fresh flash and no adjustements, TTS works OK. The hissing from the wake wavs is most probably due to the 44100. Can you change those to 22050 and try again?

My focus wil be to change settings for volume and gain (I have never actually used gain and do not know if it even works). Then see if I can reproduce, the hissing and also the crashed

Dimitar-Boychev commented 3 years ago

Yes, as I saw the 44100 in the log I knew what was needed :) I resampled the three wav files down to 22050 and changed them in the Rhasspi's web interface under Settings -> Sound and now it is all good :) I don't know how after the first erase and reflash I broke the TTS sounds, and how a restart helped and why the second time there were no problems with the TTS... Maybe some of the other actions broke it the first time ? Anyway thanks a lot :)

whinis commented 3 years ago

I am waiting on a new rPi to test, The one I used to setup the matrix is currently heavily integrated into a 3d printer.

Romkabouter commented 3 years ago

@whinis did you have the change to retest?

whinis commented 3 years ago

I have been unable to get the serial output on my pi zero w to work, I have not been able to figure out why. In the mean time I purchased 3 of the echos and playing long audio files causes them to crackle loudly while the song can be heard in the background of the crackling. Changing volume also seems to have no effect similar to the matrix.

Romkabouter commented 3 years ago

Is there a possiblity to attach the audio files?

whinis commented 3 years ago

Not due to copyright, its from my audio library. I am looking for some open license songs to replicate the issue with so that I may

whinis commented 3 years ago

I found this royalty free music on pixabay by Michael Kobrin. Used Audacity to take the mp3 and turn it into a wav and resampled to 16000. To my ears on my desktop it sounds no different but im also not an audiophile.

I then use this python to send the result to mqtt

import paho.mqtt.publish as mqtt
data = open("<musicFolder>\\nightlife-michael-kobrin-95bpm-3783.wav",'rb').read()
mqtt.single("hermes/audioServer/echo_voice_livingroom/playBytes/myID", payload=data, qos=0, retain=False, hostname="homeassistant.local",
           port=1883, client_id="", keepalive=60, will=None, auth= {'username':"test", 'password':"alsotest"}, tls=None
            , transport="tcp")

And the result is lots of popping and crackling with the occasional note of the song playing on my echo https://cloud.whinis.com/index.php/s/sNiYCSAtzKYTP2r ( I tried attaching the file but just kept getting a is not included in the list error)

The problem seems less bad if you make the wav a mono file but still bad.

Romkabouter commented 3 years ago

Ok, thnx. I will see if I can reproduce and maybe fix it :)

whinis commented 3 years ago

Not sure if its related or I am just playing audio wrong, but with the following code it seems all 3 echos I have and the matrix are just outputting clicking on the micrphone as I try and debug why my hotword is not working


from pydub import AudioSegment
from pydub.playback import play
import paho.mqtt.client as mqtt

topic = "hermes/audioServer/echo_voice_bedroom/audioFrame"

user = "test"
pw = "test"
host = "homeassistant.local"
port = 1883

def on_message(client, obj, msg: mqtt.MQTTMessage):
    # Advanced usage, if you have raw audio data:
    sound = AudioSegment(
        # raw audio data (bytes)
        data=msg.payload,

        # 2 byte (16 bit) samples
        sample_width=2,

        # 44.1 kHz frame rate
        frame_rate=16000,

        # stereo
        channels=1
    )
    play(sound)

mqttc = mqtt.Client()
mqttc.on_message = on_message
mqttc.username_pw_set(user, pw)
mqttc.connect(host, port)

mqttc.subscribe(topic, 0)

rc = 0

while rc == 0:
    rc = mqttc.loop()
print("rc: " + str(rc))```
Romkabouter commented 3 years ago

The devices are sending a huge amount of small wave files, which you code tries to play. That is probably not working, you are also not parsing the wave header

Try this script: https://github.com/Romkabouter/ESP32-Rhasspy-Satellite/blob/voco/record.py It save a couple of seconds to a file

But you topic is for the OUTPUT, so the recording of the mic, I do not know if that is what you want.

Romkabouter commented 3 years ago

I found this royalty free music on pixabay by Michael Kobrin. Used Audacity to take the mp3 and turn it into a wav and resampled to 16000. And the result is lots of popping and crackling with the occasional note of the song playing on my echo https://cloud.whinis.com/index.php/s/sNiYCSAtzKYTP2r ( I tried attaching the file but just kept getting a is not included in the list error)

The sample you provided is 44100: image

That is a known issue, but when I resample it to 11025 I also hear the issue. I have no solution yet

whinis commented 3 years ago

The devices are sending a huge amount of small wave files, which you code tries to play. That is probably not working, you are also not parsing the wave header

Try this script: https://github.com/Romkabouter/ESP32-Rhasspy-Satellite/blob/voco/record.py It save a couple of seconds to a file

But you topic is for the OUTPUT, so the recording of the mic, I do not know if that is what you want.

Yes this worked much better. It seems none of the Echos have any real pickup. I can open that as another issue but being right next to it and regardless of gain setting I can barely hear myself in the recording. Meanwhile I am loud and clear in the matrix voice at the same distnace

Romkabouter commented 3 years ago

It seems none of the Echos have any real pickup. I can open that as another issue but being right next to it and regardless of gain setting I can barely hear myself in the recording. Meanwhile I am loud and clear in the matrix voice at the same distnace

This issue is for audioOutput, so if you want you can open a separate issue. Although I just use the I2S code from the examples from M5 themselves and my hotword is triggering, I also found the volume very low on recording. Might be a issue with the device itself, which I cannot fix.

Romkabouter commented 3 years ago

Good news and bad news. The good news is, I found the issue The bad news is, that I cannot think of a way to solve it.

What happens is, that when a large file is coming received the incoming data is faster than the output writes. This is due to the samplerate.

So what I have is a delay to hold back on the ringbuffer push, but that has this very annoying side effect. That delay is actually in the getting started as to what not to do: http://marvinroger.viewdocs.io/async-mqtt-client/1.-Getting-started/

I need to find a solution for the fact that the async lib is processing incoming data faster than the audio writes the data to the speakers. I have a very large ringbuffer (60000 bytes), but is fills faster than it is emptied. The ringbuffer can also not be a lot bigger due to memory limitations.

whinis commented 3 years ago

Could you empty part of the ring buffer if its fills? The idea being 250 or 500 bytes may be a few ms of sound and so losing it shouldn't be very noticeable unless one looks for it. If the ring buffer fills you clear out 250-500 bytes ahead to give it room and continue on.

Ideally I would like to use this at some point for phone calls or music played through my homeassistant server.

Romkabouter commented 3 years ago

Yes you can empty it, but than your audio will be missing. You will absolutely hear a view missing ms of sound. And also, 500 bytes is not near enough sadly. The mqtt client receives a 1460 bytes or so per message. When I resample the audiofile to 11025Hz and trim it to 4 seconds, the file is 177296 bytes. With this, the onMQTTMessage callback is called around 121 times. I am now looking for a way to maybe buffer to file or something, but that really just moves the problem

chris-kuhr commented 3 years ago

Shouldn't the rhaspy server provide an isochronuous stream? Is it transmitting erraticly or with fixed intervalls?

Romkabouter commented 3 years ago

No, it just publishes the whole audio data in one payload over MQTT

chris-kuhr commented 3 years ago

is it different with udp streaming?

Romkabouter commented 3 years ago

Yes, upd streaming sends raw packets with a certain blocklength

chris-kuhr commented 3 years ago

do you already have an idea how to implement it? I think MQTT streaming is a dead end here...

Romkabouter commented 3 years ago

I am bound to MQTT streaming because that is what Rhasspy is using. Sadly, some functionality is not implemented in arduinoesp32, like vGetTaskByName. If that were so, could suspend the task involved in the ASyncTCP library. I have not found another suitable lib yet but I am now thinking on forking the code or implement a dedicated client.

chris-kuhr commented 3 years ago

You could use MQTT for transmitting the audio to rhasspy and UDP to receive from it. The Tx side does not need buffering in contrast to the Rx side. With a raw PCM UDP stream however, you would not need large buffers, since rhasspy already does that with its playout buffers also maintaining a constant packet interval...

chris-kuhr commented 3 years ago

the question is, wether the esp32 has enough processing power to receive udp on the same core as the i2s write task.. In addition to the MQTT stuff...

Romkabouter commented 3 years ago

and UDP to receive from it.

That is currently not possible in Rhasspy. UPD is not a setting for audio play method.

chris-kuhr commented 3 years ago

I see now. I misread the docs...

I tried to increase the DMA buffer size up to 32 blocks with 512 bytes and also made the receiver thread call audioWrite() 32 times the data. No luck so far. The 44.1kHz sample with the I2S port also at 44.1kHz produces the same sound. I still have to try it with a 16kHz audio sample.

Romkabouter commented 3 years ago

The problem is not the buffer. The actual issue is that you cannot pause the aSync MQTT task.

This is what happens

I have made a pause function in the aSynch lib to prove my point and indeed, audio plays much better. There are still some issues, but I was just proving the point for myself.

I need to find a way to synchronize the bytes coming in from the MQTT with the playing of the bytes, otherwise there will be packetloss, causing this issue

Romkabouter commented 3 years ago

Please check out https://github.com/Romkabouter/ESP32-Rhasspy-Satellite/releases/tag/v7.6 It should solve the audio playback issues

I will close this, but if it is still an issue, please reopen :)