Open auxilio-ab opened 1 year ago
I've been able to get better (although still not great) response times by going into the settings (the gear thing), and increasing the self.threads from the default 4 (I think that's what it was) to 10-13. The responses then use a LOT more CPU power and the responses are (slightly) faster.
Still like 30-40 sec.
I also tried making the thread count higher but it did not change much. The program generates like 2 words a minute. It uses ~60% CPU for like 20 seconds and then it drops to between 1% and 5%. There are no issues with cooling, I checked that. It also just uses 2 GB of ram. I use AMD Ryzen 7 5800x with 16 GB of RAM. I run it with docker on windows 11 with debian wsl.
Amazed it's running at all, so thank you! But yes, it's unusably slow even when I give it more threads.
Model Name: MacBook Pro Model Identifier: MacBookPro17,1 Model Number: MJ123LL/A Chip: Apple M1 Total Number of Cores: 8 (4 performance and 4 efficiency) Memory: 16 GB
same here...very slow response 120-300sec. specs:
I've changed Threads too but it's just saving a few seconds. Still generating word for word after seconds ;-) Still great work!!
After getting rid of all the other issues (see other issue tickets for "models could not be loaded due to localhost issue" and "only a specific model can be used") I finally managed to get alpaca-turbo running.
But if I type I question, it takes over 130 seconds to reply with only the fraction of a word. After around 210 seconds the first sentence was finally completed.
The docker image is running on a server with 32GB of RAM and 16 CPU cores. They are far from being stressed or so. (RAM usage 2,9 GB, CPU 25%).
What could be the issue?