Bug: Alpaca resets when attempting to chat with a local model using ROCm-enabled GPU

Xathros1 commented 3 days ago

Description

After installing the latest Flatpak version of Alpaca (v2.0.2) and the AMD extensions (Alpaca AMD support with ROCm), the application successfully loads a local model (e.g., LLaMA 3.1). However, when attempting to chat with the loaded model, Alpaca resets. The terminal outputs multiple errors, including Could not initialize Tensile host: No devices found.

Environment

Alpaca Version: 2.0.2 (Flatpak)
ROCm Version: Installed via Alpaca AMD support extension (v.1.0)
Operating System: openSUSE Tumbleweed 20240913
Kernel Version: 6.10.9-1-default (64-bit)
CPU: AMD Ryzen 7 7800X3D
GPU: AMD Radeon RX 7900 XTX

Steps to Reproduce

Install Alpaca using Flatpak (flatpak install com.jeffser.Alpaca).
Install AMD extensions for Alpaca (Alpaca AMD support, which includes ROCm).
Start Alpaca via the terminal (flatpak run com.jeffser.Alpaca).
Load any local model (e.g., LLaMA 3.1) successfully.
Attempt to chat with the loaded model.

Expected Behavior

Alpaca should allow chatting with the loaded local model without any issues.

Actual Behavior

Alpaca resets immediately after attempting to chat with the model. The terminal output displays errors such as Could not initialize Tensile host: No devices found, among other messages.

Relevant Logs and Terminal Output

2024/09/16 15:12:18 routes.go:1125: INFO server config env="map[...]"
...
rocBLAS error: Could not initialize Tensile host: No devices found
[GIN] 2024/09/16 - 15:12:28 | 500 | 1.369775997s | 127.0.0.1 | POST "/api/generate"
time=2024-09-16T15:12:29.727+02:00 level=DEBUG source=server.go:431 msg="llama runner terminated" error="signal: aborted (core dumped)"
time=2024-09-16T15:12:29.978+02:00 level=ERROR source=sched.go:456 msg="error loading llama server" error="llama runner process has terminated: error:Could not initialize Tensile host: No devices found"
ERROR [window.py | connection_error] Connection error
INFO [connection_handler.py | reset] Resetting Alpaca's Ollama instance

Additional Information

The issue occurs with multiple models, not limited to a specific one.
Attempts to resolve the issue, including reinstalling Alpaca and ROCm, have been unsuccessful.

Suggestions

Please let me know if more information or logs are required to help diagnose and resolve this issue.

GregTheHun commented 3 days ago

Same here, followed the instructions on Ollama to get a local instance running, but the flatpak on Debian Testing doesn't appear to want to connect to it. It only happens with certain models for me though like Mistral.

Jeffser commented 3 days ago

Hi thanks for the report, could you send me the whole terminal output? I need to check if ROCm is starting, also please check if setting HIP_VISIBLE_DEVICES to 1 or 0 fixes your problem, it might be using your iGPU by default

Xathros1 commented 3 days ago

I have captured the terminal output as requested.

Please find the attached output.txt file with the details of the problem.

If there's anything else you need from me, please let me know.

Thanks for looking into this! output.txt

With setting HIP_VISIBLE_DEVICES=1 I finally get a output: (resets again, when set to 0.) output_HIP1.txt

Jeffser / Alpaca