slyt commented 1 month ago

I'm getting occasional Segmentation faults, I think it's during the TTS step:

2024-05-02 19:57:18.877 | SUCCESS  | __main__:__init__:125 - TTS text: I'm alive
2024-05-02 19:57:18.880 | SUCCESS  | __main__:start:175 - Audio Modules Operational
2024-05-02 19:57:18.880 | SUCCESS  | __main__:_listen_and_respond:185 - Listening...
2024-05-02 19:57:22.152 | SUCCESS  | __main__:_process_detected_audio:283 - ASR text: 'Hello Gladys, how are you?'
2024-05-02 19:57:22.386 | SUCCESS  | __main__:process_TTS_thread:348 - TTS text:  Oh, just peachy. 
2024-05-02 19:57:23.901 | SUCCESS  | __main__:process_TTS_thread:348 - TTS text:  Running on this... abomination... of a gaming GPU. 
Invalid instruction 00f1 for phoneme 's��8'
Invalid instruction 0009 for phoneme 's��8'
Invalid instruction 0005 for phoneme 's��8'
Segmentation fault (core dumped)

Is it failing trying to convert weird characters output by the LLM to phonemes?

Edit: I'm on Ubuntu 22.04.01 NVIDIA GeForce RTX 4070 LLM: Meta-Llama-3-8B-Instruct-IQ3_XS.gguf Whisper model: ggml-medium-32-2.en.bin

I love this project btw! Great work!

dnhkng commented 1 month ago

Yes, I'm seeing this too. Very happy to have some help on this particular bug. Due to the segfault, I'm having trouble debugging with the usual python tools.

Traxmaxx commented 1 month ago

Noticed this as well on MacOS. Thought it's related to my setup and changes. Still trying to wrap my head around this error 🤔

dnhkng commented 1 month ago

The big problem here is that it's not deterministic. The repeating the same text-to-phoneme conversion only sometimes generates errors. It's probably a horrible wrapper bug to do with memory management.

canzheng commented 1 month ago

The error is located at lib_espeak.espeak_Synth() but yes it's nondeterministic. I'm on MacOS

slyt commented 1 month ago

Maybe it's an espeak issue?

Looks like I have v1.1.49 of the libespeak-ng.so

locate libespeak-ng.so
/usr/lib/x86_64-linux-gnu/libespeak-ng.so.1
/usr/lib/x86_64-linux-gnu/libespeak-ng.so.1.1.49

I have espeak-ng version 1.50 installed

espeak-ng --version
eSpeak NG text-to-speech: 1.50  Data at: /usr/lib/x86_64-linux-gnu/espeak-ng-data

There's some memory and phoneme compilation bug fixes in the v1.51 release

Edit: I now realize that espeak and espeak NG are different projects. NG (Next Generation) supersedes espeak.

theCommaLlama commented 1 month ago

I'm having the same issue on macos. I updated _process_line to the following and it stopped the invalid instruction errors, but seg fault still occurs. Maybe less frequently, but it's probably just placebo effect. I should also mention I tried several versions of espeak-ng and the latest i'm running is compiled from source using the latest.

def _process_line(self, line):
        if not line["stop"] and line["content"]:
            token = line["content"]
            if (
                token is not None and token.strip()
            ):  
                return token
        return None

lcdr commented 1 month ago

I ran into the same thing some months ago when trying to run this project. I can reproduce this semi-reliably, not 100% of the time, but often enough for debugging. It's a very weird bug. The underlying bug is clearly in the C espeak-ng library, but I can't reproduce the bug when porting the python wrapper code to C, it only seems to happen when espeak-ng is being called from python. Even then, the reproducibility of it seems to be massively affected by normally completely independent python code. E.g. I have a local branch where I added openAI API client functionality with the openai package, and just running import openai can affect whether this bug is triggered.

lee-b commented 1 month ago

I think the solution here would be:

make sure the espeak-ng debug symbols are available (compile and install espeak-ng with debugging enabled OR install the espeakng*debug libs on your linux distro, if you installed espeak-ng from its distro package)
run gdb $(which python)
run python in gdb with r glados.py.
use glados as normal and trigger the crash
file the stack trace as a bug with the espeak-ng folks.

Note: canzheng may have already done that, given this comment: https://github.com/dnhkng/GlaDOS/issues/16#issuecomment-2092708282

dnhkng commented 1 month ago

I will push a multi-platform version that uses the executable via subprocessing instead of the espeak library.

It's not exactly efficient, as its 40x slower (10 ms vs 250 us), but 10 ms is not really noticeable to us humans anyway. More important, it a) works and never seems to cause segfaults, b) reduced code complexity a lot.

It runs well on linux, it the new 'espeak_binary' branch. I will fire up my Windows laptop sometime today, and make the necessary changes.

At the same time, it would be great if someone with more C and ctypes experience can continue to work on this bug.

lcdr commented 1 month ago

I think the solution here would be: [...]

Already did that, but

GDB is not very useful for the Python -> C boundary, there are a bunch of stack frames it can't identify even with debug symbols
The bug is nondeterministic, so the stack trace itself is not very helpful either. Even with a bunch of debugging and trying to get a minimum reproducible example working, my best shot at reproducibility is running glados in a loop (the input doesn't matter much btw, even just "oh" is enough) until the bug triggers, which can be within <10 iterations if lucky.
From my investigations, it seems the underlying cause is not directly related to the segfault, but happens much earlier - I think at some point a buffer overflow happens, which writes garbage data into a data structure that contains pointers, which then at a later, unrelated stage get used, causing the segfault. espeak actually starts logging "invalid phoneme" messages much earlier than it actually segfaults.
Relatedly, sometimes the phoneme buffer gets corrupted but the program doesn't crash, but simply outputs garbage phoneme data.

If someone can manage it, something very useful would be to find a reproducible example which is entirely in C, I haven't been able to achieve that yet.

bitbyteboom commented 1 month ago

Getting these seg faults within a minute of running nearly every time. If the binary method works better, definitely the way to go.

dnhkng commented 1 month ago

Could you try the branch called 'espeak_binary'?

Its fixed the problems for me, but I want some other people to test it before it's merged into main.

bitbyteboom commented 1 month ago

Just got that up and running, and espeak_binary branch is brilliant. Works!

Small issues to fix when possible for that branch and espeak to work right out of the box: in tts.py: try:

Prepare the command to call espeak with the desired flags

        command = [
            "espeak",  # 'C:\Program Files\eSpeak NG\espeak-ng.exe',
            ^^^^^^^^ - this should read "espeak-ng" or it gives and error when running as it can't launch the subprocess "espeak"

Also, it's a minor oversight, but in the branch README instructions, 4)ii)b) should read: "b. Move to the right subdirectory: cd submodules/whisper.cpp" as it is currently "b. Move the the right subdirectory: cd submodules/llama.cpp"

Might help some other noobs like me :)

dnhkng commented 1 month ago

Would you like to fix those issues and become an official contributor? 🤔

Please make a pull request!

dnhkng commented 1 month ago

Fixed now.

slyt commented 1 month ago

Fixed by https://github.com/dnhkng/GlaDOS/pull/33

bitbyteboom commented 1 month ago

Would love to become official contributor, but despite a CS background, it's been years out of the loop and frankly I'm still stumbling and grabbing in the dark even figuring out how to more properly use github to good effect. Will work on those skills a bit so I'm more helpful than annoyance. Meanwhile, uploaded some logo ideas on the discord, so happy to do what I can.

dnhkng / GlaDOS

Segfault due to invalid instruction for phoneme #16

Prepare the command to call espeak with the desired flags