Open KSemenenko opened 2 years ago
I couldn't figure out the best area label to add to this issue. If you have write-permissions please help me learn by adding exactly one area label.
Tagging subscribers to this area: @mangod9 See info in area-owners.md if you want to be subscribed.
Author: | KSemenenko |
---|---|
Assignees: | - |
Labels: | `area-System.Threading`, `tenet-performance`, `untriaged` |
Milestone: | - |
I don't think Thread.Sleep
is helpful here. Thread.Sleep
is as imprecise.
Quick test:
var stopwatch = new Stopwatch();
stopwatch.Start();
for (int i = 0; i < 10; i++)
{
Thread.Sleep(1);
Console.WriteLine(stopwatch.ElapsedMilliseconds);
}
Results: (Windows x64) 2 17 33 49 64 80 96 111 126 141
I'm not sure this is something we can reliably provide as an API due to differing support of various hardware and Operating Systems/platforms.
Maybe something could be provided with some sort of query that indicates whether such functionality is supported. But even then, OS level thread-scheduling and power management functions might get in the way.
For reference, on Windows essentially anything needing higher than 16ms
resolution (the "default" time slice) needs to use the "multimedia" timers and opt-in. There are then a number of limitations and restrictions around them and it impacts the scheduling/priority of your entire process (and potentially other processes as well).
Some platforms, such as the Raspberry PI
, WASM
, Android/iOS
in "power saving mode", etc, might not support running events this frequently.
For me it would be good, a new type like MultimediaTimer, which will be able to give intervals of 1ms. Of course it would use the system functions. And of course it will have an impact on the battery, but it will be a conscious choice of the developers if they needs such a timer.
Would be nice that such feature could be enabled via timeBeginPeriod
/ timeEndPeriod
pairs or the not so documented NtSetTimerResolution
.
To my surprise, after setting 1ms desired resolution via above APIs, even though Thread.Sleep resolution seems to be affected, the System.Threading.Timer
s still fire with resolution >10ms.
OS: Windows 11 .Net 6.0.3
My understanding so far: Defaults observation: Number of times one can Thread.Sleep(1) per 1 second: ~64 Number of GetTickCount64 changes per 1 second: ~64
After setting 1ms resolution via above APIs: Number of times one can Thread.Sleep(1) per 1 second: ~666 Number of GetTickCount64 changes per 1 second: ~64
So even though the sleep in timer thread is likely affected correctly by these APIs:
void ThreadpoolMgr::TimerThreadFire()
..
SleepEx(timeout, TRUE);
The problem is that the counts used to determine currentTime elapsing have a resolution of ~10-15ms because GetTickCount / GetTickCount64 is used:
DWORD ThreadpoolMgr::FireTimers()
DWORD currentTime = GetTickCount();
...
if (TimeExpired(LastTickCount, currentTime, timerInfo->FiringTime))
{
Would it be possible to replace DWORD currentTime = GetTickCount();
with QPC counters / QueryPerformanceCounter if available? or via configuration?
NtSetTimerResolution
is undocumented -> we cannot use it.
but then how is it done in programs that work with MIDI files, for example? Or in games?
Also, if it's not a documented feature, can't Microsoft document it?
Many games and other multimedia apps don't use things like NtSetTimerResolution
, they actively avoid sleep
and other "expensive" operations.
Instead they use things like (varying from scenario to scenario of course):
QueryPerformanceFrequency
and QueryPerformanceCounter
or CLOCK_MONOTONIC
on Linux)CreateEventW
and WaitForSingleObject
or WaitOnAddress
)timeBeginPeriod
and timeEndPeriod
)Drivers have access to additional APIs (like ExSetTimerResolution
) and often toggle various settings when you create your first DirectX/Vulkan/OpenGL device.
How about emitting nop
s to make the delay?
nop
takes decoding time but generally no execution time. You wouldn't want to execute a million nop
just to wait 0.25-1 millisecond.
Typically when you want to "delay" for a very brief period of time, you emit pause
(or yield
on Arm64). On modern CPUs this waits for about 100-140 clock cycles and correctly plays into power management/efficiency settings (which is important for mobile, laptops, and other scenarios).
If you need to wait for a longer period of time, but without giving up your time slice, you'd use the system level fences. This is what DirectX12 and Vulkan use in coordination with the GPU fences.
while (_running)
{
if (stopwatch.ElapsedMilliseconds - lastTime >= intervalMs)
{
callback();
lastTime = stopwatch.ElapsedMilliseconds;
}
if (!Thread.Yield())
Thread.Sleep(0);
}
On Windows you can also use a waitable timer object with CREATE_WAITABLE_TIMER_HIGH_RESOLUTION
. However, this only works since Windows 10 1803 and the actual timer resolution will depend on the underlying hardware.
See https://docs.microsoft.com/en-us/windows/win32/api/synchapi/nf-synchapi-createwaitabletimerexw
if (!Thread.Yield()) Thread.Sleep(0);
You typically don't want to use Thread.Yield()
or Thread.Sleep(0)
, both relinquish the remainder of the allocated time slice and can cause you to not run again until the OS schedule first goes through the queue of equal and higher priority threads waiting to run.
The pause
(x86/x64) and yield
(Arm32/Arm64) instructions are different from these.
This is also why in games you may not want to use the various built-in System.Threading.*
locking primitives (like SpinLock
) as these frequently utilize Yield
or Sleep
under the hood. Lower level graphics frameworks (like DX12MA) frequently use SRWLOCK
(Windows) instead, as that provides the relevant behavior without incurring the negative side effects of relinquishing the time slice and thus is suitable for use in such games or other multimedia based applications.
Yes, also time for call back excecution. It’s timer, so it’s generate event and don’t wait until previous one will be finished
So, only windows api is the blocker?
This is not just a Windows limitation, this is a limitation of both hardware and software for almost any platform that might be targeted.
Most platforms are not designed or oriented around real time execution, targeting sub millisecond execution is not something that really has any guarantees for most platforms and which often requires additional care and work to make partially possible in the first place
NtSetTimerResolution
is undocumented -> we cannot use it.
I'm not advocating that the runtime uses these APIs to change system timers.
Undocumented NtSetTimerResolution
or documented timeBeginPeriod and timeEndPeriod can be used by the user via pinvoke.
What the user cannot change is the utilization of low resolution GetTickCount / GetTickCount64 in runtime system timers implementation, hence the ask if there could be a use of QueryPerformanceCounter timers, for example:
There are many more considerations than simply increasing the timer resolution or utilizing QPC within a timer.
This is something that is going to be application specific and which needs to be considered at the very least thread wide for a given process, but more likely also managed even higher at the general process level.
It is not something which can be trivially provided by the BCL. The BCL does, however, already expose all the relevant tools such that an interested user could roll their own scenario specific timer after having invoked the relevant platform specific APIs.
hello! if you use timer call from driver (ExSetTimerResolution) all works good. Only 1 programm use this timer call is DPC LATENCY CHEKER. just open it and this function is start working.
but i have a question. My platform z790 have a 3 timer resoltion in TSC tick. 0.997 0.999 1000 im want use only 1000, but im tired reboot to reboot to get this value. Im want stable 1ms in my system, and i dont want use RTC tick (useplatformtick yes) to get 1ms. how to get stable 1000us (not 0.997, 0.999 or like that) on TSC tick?
@Slendermid sounds like it could be related to "clock spread spectrum in the BIOS settings", though not sure. Try to disable it and retest.
@Slendermid sounds like it could be related to "clock spread spectrum in the BIOS settings", though not sure. Try to disable it and retest.
im thinking about that, because all z690/z790 on ddr4 memory dont have option to disable bclk spread spectrum, but i have a friend with z790 ddr5 board, and same values. Also, base cpu speed is 3.42 not 3.40 like on am4/z390/z490, but if im enable EIST it will be 3.40, anyway, that not resolve problem. im try to write a driver with 1001 timer res, it give me stable 1000, but after open another program, it decrease to 0.997. Maybe have any function to stable lock timer to 1000us?
write a driver
You're trying to write a driver in C#? Good luck with that - you'd be better off with C/C++ or Rust.
Maybe have any function to stable lock timer to 1000us?
This is an OS/hardware layer concern, and C# doesn't natively provide a way to even get a timer at that resolution, and so we're unlikely to be able to provide something like that in the first place. I'm not sure such a thing would even actually be provided (at least not on standard consumer hardware/OS) - your driver/whatever is normally expected to handle the actual delta that occurred, not assume it was completely static.
my code in driver: (also im not a good coder)
NTSTATUS DriverEntry() { ExSetTimerResolution(10000, TRUE); return STATUS_SUCCESS; }
i know is hardware issue or like that, but if system can get 1000 or 0.9999 randomly (instead of 0.9966) i think we can lock it on 1000. If the system has chosen 0.9966 timer res i can stable it to 1000, but if another programm call timer (1ms), it decrese to 0.9966. Im prefer 0.9999 or 1000 because that values have lowest Kernel Timer Latency. (1-3us vs ~950us). Also RTC tick with always stable 1000 timer res give me 1000us kernel latency.
Some MS guys just did it for GO on Windows: https://devblogs.microsoft.com/go/high-resolution-timers-windows/ (And it seems it was working for GO on linux for a long time) I suppose Net can have similar approach implemented as well.
So maybe in .NET 10 then ? 😉
As per the above, there is a "lot" of complexity in doing this and doing it correctly. It's also not something that can be strictly guaranteed across all systems/hardware and the places it is needed are a bit more niche.
Due to all of that, this isn't something I can see getting prioritized for .NET 10. However, there is nothing preventing a 3rd party library from being created that provides the functionality in the meantime.
System timers that can be called once every 1 millisecond.
This is very difficult to do at the moment,. especially when it comes to cross-platform. Perhaps we should add such a high-precision system timer, it will come in handy for games or playing MIDI file.
As you can see, for small intervals, 15.6 ms is the best average. As you know, this is the standard Windows system timer resolution, which you can read in detail in the document from Microsoft called Timers, Timer Resolution, and Development of Efficient Code (http://download.microsoft.com/download/3/0/2/3027d574-c433-412a-a8b6-5e0a75d5b237/timer-resolution.docx)
So as you can see, it cannot be done without using operating system specific functions. or implementing through infinite loop and Task.Wait/Thread.Sleep methods.