dotnet / runtime

.NET is a cross-platform runtime for cloud, mobile, desktop, and IoT apps.
https://docs.microsoft.com/dotnet/core/
MIT License
15.24k stars 4.73k forks source link

Optimise Process.PrivateMemorySize64 and similar APIs on Windows #76176

Open hamarb123 opened 2 years ago

hamarb123 commented 2 years ago

Description

If someone wants to get an up-to-date value for Process.PrivateMemorySize64, then they have to call Process.Refresh. I noticed that this was really slow and I only wanted a value for PrivateMemorySize64.

Windows has an API called GetProcessMemoryInfo which we should use to get these value since it is much faster. We can open the process with the OpenProcess API and store it in a safehandle & use CloseHandle to close it.

Configuration

.NET 6.0.9 x64 Windows 10 21H2 v10.0.19044.2006 x64

Regression?

No

Data

The following bits of code execute in a similar amount of time:

var p = Process.GetCurrentProcess();
for (int i = 0; i < 2000; i++) { p.Refresh(); _ = p.PrivateMemorySize64; }
//appropriate p/invoke required for the below
var handle = OpenProcess(0x1000 /*PROCESS_QUERY_LIMITED_INFORMATION*/, false, (uint)Process.GetCurrentProcess().Id);
for (int i = 0; i < 10_000_000; i++) GetProcessMemoryInfo(handle, out var counters, (uint)Marshal.SizeOf<PROCESS_MEMORY_COUNTERS_EX>());
CloseHandle(handle);
//appropriate p/invoke required for the below
//this one does't cache the handle but is still very fast
var pid = Process.GetCurrentProcess().Id;
for (int i = 0; i < 3_000_000; i++)
{
    var handle = OpenProcess(0x1000 /*PROCESS_QUERY_LIMITED_INFORMATION*/, false, (uint)pid);
    if (handle == IntPtr.Zero)
    {
        throw new Exception("Failed to open process handle to read current process information.");
    }
    GetProcessMemoryInfo(handle, out var counters, (uint)Marshal.SizeOf<PROCESS_MEMORY_COUNTERS_EX>());
    CloseHandle(handle);
}

Which means we can do it ~5000x faster (for the speeds on my machine anyway).

Analysis

It seems like we retrieve info for every process every time we do anything (https://github.com/dotnet/runtime/blob/a5f3676cc71e176084f0f7f1f6beeecd86fbeafc/src/libraries/System.Diagnostics.Process/src/System/Diagnostics/ProcessManager.Windows.cs). We can probably also apply similar optimisations to the other methods - I'd also be interested in investigating the other functions and how they can be optimised once we get this one approved - I'd like to do the different functions as seperate PRs if I get approved to do it.

We should use GetProcessMemoryInfo for the info that can be gathered from it. I'd like to make a PR for this if it is approved by the area-owners which would be merged by .NET 8 ideally (please add me to assignees also on this and the PR if approved, thanks).

ghost commented 2 years ago

Tagging subscribers to this area: @dotnet/area-system-diagnostics-process See info in area-owners.md if you want to be subscribed.

Issue Details
### Description If someone wants to get an up-to-date value for `Process.PrivateMemorySize64`, then they have to call `Process.Refresh`. I noticed that this was really slow and I only wanted a value for `PrivateMemorySize64`. Windows has an API called `GetProcessMemoryInfo` which we should use to get these value since it is much faster. We can open the process with the `OpenProcess` API and store it in a safehandle & use `CloseHandle` to close it. ### Configuration .NET 6.0.9 x64 Windows 10 21H2 v10.0.19044.2006 x64 ### Regression? No ### Data The following bits of code execute in a similar amount of time: ```cs var p = Process.GetCurrentProcess(); for (int i = 0; i < 2000; i++) { p.Refresh(); _ = p.PrivateMemorySize64; } ``` ```cs //appropriate p/invoke required for the below var handle = OpenProcess(0x1000 /*PROCESS_QUERY_LIMITED_INFORMATION*/, false, (uint)Process.GetCurrentProcess().Id); for (int i = 0; i < 10_000_000; i++) GetProcessMemoryInfo(handle, out var counters, (uint)Marshal.SizeOf()); CloseHandle(handle); ``` Which means we can do it ~5000x faster (for the speeds on my machine anyway). ### Analysis It seems like we retrieve info for every process every time we do anything (https://github.com/dotnet/runtime/blob/a5f3676cc71e176084f0f7f1f6beeecd86fbeafc/src/libraries/System.Diagnostics.Process/src/System/Diagnostics/ProcessManager.Windows.cs). We can probably also apply similar optimisations to the other methods - I'd also be interested in investigating the other functions and how they can be optimised once we get this one approved - I'd like to do the different functions as seperate PRs if I get approved to do it. We should use `GetProcessMemoryInfo` for the info that can be gathered from it. I'd like to make a PR for this if it is approved by the area-owners.
Author: hamarb123
Assignees: -
Labels: `area-System.Diagnostics.Process`, `tenet-performance`
Milestone: -
hamarb123 commented 2 years ago

(also, when on earth did I add that label lol, and how do i do it in the future?)

Gnbrkm41 commented 2 years ago

(also, when on earth did I add that label lol, and how do i do it in the future?)

Must've used the performance issue template. 😁

While it sounds like a promising change it definitely would break the contract that 'the process component is a snapshot of the process resource at the time they are associated.'. Perhaps it would be better if exposed as a new API (dunno, maybe GetCurrentPrivateMemorySize64)? In which case, it also might be a good idea to expose similar APIs for other properties like processor times and other memory related properties.

filipnavara commented 2 years ago

In which case, it also might be a good idea to expose similar APIs for other properties like processor times and other memory related properties.

Similar properties are already exposed in System.Environment so that place would be a natural fit. For example, there's Environment.WorkingSet that exposes a value from PROCESS_MEMORY_COUNTERS.

hamarb123 commented 2 years ago

Similar properties are already exposed in System.Environment so that place would be a natural fit. For example, there's Environment.WorkingSet that exposes a value from PROCESS_MEMORY_COUNTERS.

GetProcessMemoryInfo can get it for any process, so we'd want a new set of APIs or to put it with Process. I was unaware that Process was a snapshot at a particular point in time (is this completely true on unix platforms?).

Perhaps we could add a new property to Process and ProcessStartInfo that allows the user to change the mode of caching between something like Snapshot, Current, and CurrentCached. If we did this we would want to update all of the relevant APIs at once. Is there a complete list of APIs that would need to be changed, presumably just everything that is changed by Refresh right?

adamsitnik commented 2 years ago

@hamarb123 thank you for your proposal!

If I understand correctly you would like to change Refresh to update only the properties that have been used so far by using specialized, cheaper sys-calls?

Do we currently have the possibility to verify whether PrivateMemorySize64 was the only property being initialized so far?

hamarb123 commented 2 years ago

If I understand correctly you would like to change Refresh to update only the properties that have been used so far by using specialized, cheaper sys-calls?

Do we currently have the possibility to verify whether PrivateMemorySize64 was the only property being initialized so far?

Currently when we do Refresh() we get a process snapshot, which requires iterating through every process on the system (I think this is the only way to get a snapshot) and we extract all the information at once from it (which means all the info is consistent with itself). This is why it's so much slower.

However I think there are probably cases where people want this behaviour so I think we should keep this as an option (which is my proposed Snapshot caching mode).

Note: It is technically a breaking change to change this snapshot behaviour to not be the default, 'the process component is a snapshot of the process resource at the time they are associated.'. So I'm not sure if we want to keep it as the default (but it wouldn't be a breaking change afaik if we do, but also many new users would miss out on this faster code), but we should definitely keep it as an option.

But most of the time I don't think people need more than a few changing properties from a Process object, and if you only need 1, then this overhead is ridiculous. Hence why I propose 2 new modes Current and CurrentCached; CurrentCached would be the same as Current except that it will cache the values until a refresh and Current will return an up-to-date value every time. For CurrentCached, refresh will just clear the cache and not get anything, and for Current it will do nothing.

The Current mode will use APIs per property (or groups of properties if they come all at once e.g. as the memory information does on Windows at least). Since these APIs aren't worried about getting all the process' info at once in a completely consistent state, they are much faster - and if we can't find a way to get a particular piece of information on its own, we can fall back to getting it the way we get it for snapshots.

In terms of CurrentCached re the groups of properties, I think we should only cache the properties we have already read before (since the last refresh) and the property we're trying to reading; so the user doesn't get a surprising cached value from ages before they think it should be from (due to calculating another property) - but maybe this would be an option too since it's just a change in the caching mechanism and wouldn't be too hard to implement.

GSPP commented 2 years ago

@hamarb123 let's say, a developer wants to query a related set of performance variables such as the three CPU numbers (total, user, system). You propose to query the "group" from the kernel just once instead of three times. How would that work? This seems to be in conflict with the idea of the Current mode to not cache.

hamarb123 commented 2 years ago

That's a good point. I think the Current and CurrentCached modes are still valuable as-is, but we probably should have a mode to store all of the grouped values at once, note: this will (probably) also require deciding which values those are and documenting them if we want to avoid undefined behaviour.

It could be called Grouped, here's how it could work: whenever you request a value in the group, it will get all of the values for the group and cache them together; when you request that value or any other value in the group next time, it will return the cached value (and the cache can be reset with Refresh() obviously).

Note: instead of locking in the groups permanently, we could add a bitfield enum which lists each property (of type ulong to make sure we don't run out), and a (static?) method which will return which properties are in the group of a given property, and then developers could query this when they want to make sure they are getting a new enough value in Grouped mode (and any other mode that may have special behaviour based on groups). I think I'm leaning towards this instead of deciding what the groups are and documenting them, in case we found a way to group more properties, or ungroup them into faster seperate calls in the future.

In-depth examples of each mode: Below is an example of how the different modes should work when running the following code (with a gap of any arbitrary length between each function call): ```cs //Process p = ...; p.Refresh(); _ = p.PrivateMemorySize64; _ = p.VirtualMemorySize64; _ = p.BasePriority; _ = p.PrivateMemorySize64; ``` (For the purpose of the explanation below) Assume that `PrivateMemorySize64` and `VirtualMemorySize64` are in a group, `BasePriority` is in a group on its own, these are all of the properties that are available, and all of these properties can be read at once in `Snapshot` mode. `Snapshot` (which is currently the only mode): - `Refresh()` - the cache is cleared. - `PrivateMemorySize64` - every property is read about the process all at once in a snapshot and saved to cache, value is returned from cache - `VirtualMemorySize64` - returned from cache - `BasePriority` - returned from cache - `PrivateMemorySize64` - returned from cache - Note: some properties are not currently a part of the snapshot, we should document which ones specifically when we make this mode and probably leave the behaviour as-is. `Current`: - `Refresh()` - does nothing. - `PrivateMemorySize64` - read this property (and any others that happen to come with it) and return just this one - `VirtualMemorySize64` - read this property (and any others that happen to come with it) and return just this one - `BasePriority` - read this property (and any others that happen to come with it) and return just this one - `PrivateMemorySize64` - read this property (and any others that happen to come with it) and return just this one - Note: this mode guarantees 'the property is at least as new as when you called the property' `CurrentCached`: - `Refresh()` - the cache is cleared. - `PrivateMemorySize64` - read this property (and any others that happen to come with it e.g. `VirtualMemorySize64`), save just this one to the cache and return it - `VirtualMemorySize64` - read this property (and any others that happen to come with it e.g. `PrivateMemorySize64`), save this one and `PrivateMemorySize64` (since it has been cached already) to the cache and and return this one - `BasePriority` - read this property (and any others that happen to come with it), save just this one to the cache and return it - `PrivateMemorySize64` - read this property from the cache and return it - Note: this one intentionally doesn't cache properties that haven't been read since refresh since it would guarantee 'the property is at least as new as the first time you read it since a `Refresh()`' `Grouped`: - `Refresh()` - the cache is cleared. - `PrivateMemorySize64` - read `PrivateMemorySize64` and `VirtualMemorySize64`, cache both of them, return this property - `VirtualMemorySize64` - read this property from the cache - `BasePriority` - read this property, cache it and return it - `PrivateMemorySize64` - read this property from the cache - Note: this mode guarantees 'when properties can be read in groups to save native calls, they will be'

I would also like to note: the way I would implement it, these modes would all share most of the code (other than Snapshot, which is inherently quite different) and would be easy to maintain.

adamsitnik commented 2 years ago

Currently when we do Refresh() we get a process snapshot, which requires iterating through every process on the system (I think this is the only way to get a snapshot) and we extract all the information at once from it (which means all the info is consistent with itself). This is why it's so much slower.

This is definitely expensive. I believe that we should try a simpler (but not most optimal) approach first:

When processId is known:

https://github.com/dotnet/runtime/blob/68cf2474e2503a11e3db3f305f8a61b76e1fef3e/src/libraries/System.Diagnostics.Process/src/System/Diagnostics/ProcessManager.Windows.cs#L108

Don't fetch all processes information:

https://github.com/dotnet/runtime/blob/68cf2474e2503a11e3db3f305f8a61b76e1fef3e/src/libraries/System.Diagnostics.Process/src/System/Diagnostics/ProcessManager.Win32.cs#L290-L294

and then read all fields at once:

https://github.com/dotnet/runtime/blob/68cf2474e2503a11e3db3f305f8a61b76e1fef3e/src/libraries/System.Diagnostics.Process/src/System/Diagnostics/ProcessManager.Win32.cs#L353-L369

Just use corresponding sys-calls for getting the info for single process and assign all of the information at once:

This should be relatively easier to implement as we would not need to introduce a new state flag for "incomplete process information": https://github.com/dotnet/runtime/blob/68cf2474e2503a11e3db3f305f8a61b76e1fef3e/src/libraries/System.Diagnostics.Process/src/System/Diagnostics/Process.cs#L498

and hence avoid the need for cache invalidation (we all know how it usually ends ;) ) and even more differences between Windows and other OS-es implementation for `ProcessManager:.

image

This could work an order of magnitude faster and be simple (assuming there are no major windows 7 vs 7+ differences) and straightforward to implement, without breaking and risky changes.

@hamarb123 what do you think?

hamarb123 commented 2 years ago

2 issues with this, but otherwise I think it is probably fine - but I'd still prefer to go the route with the different modes (would need to get the additional APIs required with this approved first obviously).

Firstly, would technically be a breaking change as stated here.

Secondly, it's unclear to me how the caching would work, currently you have to call Refresh() to get current info. Are you suggesting we basically implement the Grouped mode from above? I'd be fine with this. Or do you mean the Current mode? This has the issue that grouped properties are requested brand new every time.

Also, the reason I say the different modes would be easy to maintain is because we could do it basically like this (names can be changed obviously, and below is just a rough sketch of how it would work):

I think this design would be extensible, maintainable, and fast.