ggerganov / llama.cpp

LLM inference in C/C++
MIT License
61.4k stars 8.78k forks source link

all llamacpp versions losing 20% performance after "stand by" ( sleep ) windows 11 #2332

Closed mirek190 closed 3 months ago

mirek190 commented 11 months ago

My CPU 7950x3D
Windows 11 model 65B q4k_m

I made many tests with llamacpp for avx2, avx512 and cublas 12.1 ( no layers on GPU ) because I noticed lately is something weird with llamacpp performance after some time. and discovered after standby system ( sleep ) I am loosing around 20% of llama performance.

If I reboot system I run llamacpp and have around 640 ms/t then terminate llamacpp ( close ) put my PC into standby ( sleep ), awake system run llamacpp again and magically getting around 750 ms/t .

I made tests with other applications to benchmark ( maybe is something wrong with my W11 or CPU but no ) system by R23, 7zip, WinRar etc ... .have always the same results only +/- 1 % difference after reboot and after sleep... same performance EXEPT llamacpp where is 20% slower with no reason after system sleep.

before system sleep

Screenshot 2023-07-22 215005

after system sleep Screenshot 2023-07-22 215517

I just thinking if only I have that problem ... someone could test that as well?

yukiteruamano commented 11 months ago

Hello!

Windows is well known for having high latency problems after entering sleep mode and starting again.

In general, the problem is due to bad initialization on power settings after coming out of sleep mode. The problem does not affect all applications and hardware equally.

For example, there are applications that have certain system integrations and prevent these power settings from affecting their access to the computational power they need (R23, for example, has this capability).

In terms of hardware, the 2nd, 3rd and 4th generation Ryzen (Zen Core arch), have been strongly affected by latency issues in Windows before (especially Windows 11) with performance losses of up to 25% in some cases, which corresponds to the lost performance that you manage to detect in your case.

More info:

1.- https://hothardware.com/news/amd-investigating-ryzen-7000-gaming-performance-windows-11 2.- https://www.techpowerup.com/287539/amd-processors-lose-15-gaming-performance-with-windows-11-l3-cache-latency-tripled

mirek190 commented 11 months ago

Those problems you linked are solved easily already by bios setting changing priority for core0 as a main core with l3 cache. I have that changed already from beginning and you can easily observe if threads 0-15 are loaded as first ... and they are before sleep and after as well.

That above problem was my first thought.

That's why I'm so confused ... Any other application not slowing down except llamacpp.

As far as I know latency problem is very bad with Intel CPUs 12xx and 13xx series.

How to test latency ?

UPDATE:

I made some research and found out that Intel and AMD platforms are affected by this problem - RAM is loosing 10% - 20% of the READ speed after sleep on windows and linux ... from a decade at least ....

https://www.reddit.com/r/AMDHelp/comments/10ioxee/memory_performance_loss_on_am5_platform_after/

BUT is a solution - instead of using sleep we can use HIBERNTE which is not affected and default for system .... nowadays we have fast nvme SSDs so I do not see any defiance between hibernation and sleep in spite of having 64GB of RAM - same speed waking up or put to sleep / hibernation.

Before sleep ( reboot )

after-reboot

After sleep

after-sleep

Hibernation instead of Sleep fully solving the problem.

yukiteruamano commented 11 months ago

Windows being Windows, nothing new under the sun.

That said, better to reboot/shutdown/hibernate than suspend...you save the performance loss.

mirek190 commented 11 months ago

That problem is not only on Windows also Linux is affected.

github-actions[bot] commented 3 months ago

This issue was closed because it has been inactive for 14 days since being marked as stale.