Extremely slow respone - Githubissues

ViperX7 / Alpaca-Turbo

Web UI to run alpaca model locally

GNU Affero General Public License v3.0

876 stars 92 forks source link

Extremely slow respone #42

Open auxilio-ab opened 1 year ago

auxilio-ab commented 1 year ago

After getting rid of all the other issues (see other issue tickets for "models could not be loaded due to localhost issue" and "only a specific model can be used") I finally managed to get alpaca-turbo running.

But if I type I question, it takes over 130 seconds to reply with only the fraction of a word. After around 210 seconds the first sentence was finally completed.

The docker image is running on a server with 32GB of RAM and 16 CPU cores. They are far from being stressed or so. (RAM usage 2,9 GB, CPU 25%).

What could be the issue?

aalbrightpdx commented 1 year ago

I've been able to get better (although still not great) response times by going into the settings (the gear thing), and increasing the self.threads from the default 4 (I think that's what it was) to 10-13. The responses then use a LOT more CPU power and the responses are (slightly) faster.

Still like 30-40 sec.

bendeguzszkalka commented 1 year ago

I also tried making the thread count higher but it did not change much. The program generates like 2 words a minute. It uses ~60% CPU for like 20 seconds and then it drops to between 1% and 5%. There are no issues with cooling, I checked that. It also just uses 2 GB of ram. I use AMD Ryzen 7 5800x with 16 GB of RAM. I run it with docker on windows 11 with debian wsl.

wolfmcnally commented 1 year ago

Amazed it's running at all, so thank you! But yes, it's unusably slow even when I give it more threads.

Model Name: MacBook Pro Model Identifier: MacBookPro17,1 Model Number: MJ123LL/A Chip: Apple M1 Total Number of Cores: 8 (4 performance and 4 efficiency) Memory: 16 GB

BangerTech commented 1 year ago

same here...very slow response 120-300sec. specs:

intel xeon 12 cores
32GB RAM
SSD

I've changed Threads too but it's just saving a few seconds. Still generating word for word after seconds ;-) Still great work!!