Open hamarb123 opened 2 years ago
Tagging subscribers to this area: @dotnet/area-system-diagnostics-process See info in area-owners.md if you want to be subscribed.
Author: | hamarb123 |
---|---|
Assignees: | - |
Labels: | `area-System.Diagnostics.Process`, `tenet-performance` |
Milestone: | - |
(also, when on earth did I add that label lol, and how do i do it in the future?)
(also, when on earth did I add that label lol, and how do i do it in the future?)
Must've used the performance issue template. 😁
While it sounds like a promising change it definitely would break the contract that 'the process component is a snapshot of the process resource at the time they are associated.'. Perhaps it would be better if exposed as a new API (dunno, maybe GetCurrentPrivateMemorySize64
)? In which case, it also might be a good idea to expose similar APIs for other properties like processor times and other memory related properties.
In which case, it also might be a good idea to expose similar APIs for other properties like processor times and other memory related properties.
Similar properties are already exposed in System.Environment
so that place would be a natural fit. For example, there's Environment.WorkingSet
that exposes a value from PROCESS_MEMORY_COUNTERS
.
Similar properties are already exposed in
System.Environment
so that place would be a natural fit. For example, there'sEnvironment.WorkingSet
that exposes a value fromPROCESS_MEMORY_COUNTERS
.
GetProcessMemoryInfo
can get it for any process, so we'd want a new set of APIs or to put it with Process. I was unaware that Process was a snapshot at a particular point in time (is this completely true on unix platforms?).
Perhaps we could add a new property to Process
and ProcessStartInfo
that allows the user to change the mode of caching between something like Snapshot
, Current
, and CurrentCached
. If we did this we would want to update all of the relevant APIs at once. Is there a complete list of APIs that would need to be changed, presumably just everything that is changed by Refresh
right?
@hamarb123 thank you for your proposal!
If I understand correctly you would like to change Refresh
to update only the properties that have been used so far by using specialized, cheaper sys-calls?
Do we currently have the possibility to verify whether PrivateMemorySize64
was the only property being initialized so far?
If I understand correctly you would like to change
Refresh
to update only the properties that have been used so far by using specialized, cheaper sys-calls?Do we currently have the possibility to verify whether
PrivateMemorySize64
was the only property being initialized so far?
Currently when we do Refresh()
we get a process snapshot, which requires iterating through every process on the system (I think this is the only way to get a snapshot) and we extract all the information at once from it (which means all the info is consistent with itself). This is why it's so much slower.
However I think there are probably cases where people want this behaviour so I think we should keep this as an option (which is my proposed Snapshot
caching mode).
Note: It is technically a breaking change to change this snapshot behaviour to not be the default, 'the process component is a snapshot of the process resource at the time they are associated.'. So I'm not sure if we want to keep it as the default (but it wouldn't be a breaking change afaik if we do, but also many new users would miss out on this faster code), but we should definitely keep it as an option.
But most of the time I don't think people need more than a few changing properties from a Process
object, and if you only need 1, then this overhead is ridiculous. Hence why I propose 2 new modes Current
and CurrentCached
; CurrentCached
would be the same as Current
except that it will cache the values until a refresh and Current
will return an up-to-date value every time. For CurrentCached
, refresh will just clear the cache and not get anything, and for Current
it will do nothing.
The Current
mode will use APIs per property (or groups of properties if they come all at once e.g. as the memory information does on Windows at least). Since these APIs aren't worried about getting all the process' info at once in a completely consistent state, they are much faster - and if we can't find a way to get a particular piece of information on its own, we can fall back to getting it the way we get it for snapshots.
In terms of CurrentCached
re the groups of properties, I think we should only cache the properties we have already read before (since the last refresh) and the property we're trying to reading; so the user doesn't get a surprising cached value from ages before they think it should be from (due to calculating another property) - but maybe this would be an option too since it's just a change in the caching mechanism and wouldn't be too hard to implement.
@hamarb123 let's say, a developer wants to query a related set of performance variables such as the three CPU numbers (total, user, system). You propose to query the "group" from the kernel just once instead of three times. How would that work? This seems to be in conflict with the idea of the Current
mode to not cache.
That's a good point. I think the Current
and CurrentCached
modes are still valuable as-is, but we probably should have a mode to store all of the grouped values at once, note: this will (probably) also require deciding which values those are and documenting them if we want to avoid undefined behaviour.
It could be called Grouped
, here's how it could work: whenever you request a value in the group, it will get all of the values for the group and cache them together; when you request that value or any other value in the group next time, it will return the cached value (and the cache can be reset with Refresh()
obviously).
Note: instead of locking in the groups permanently, we could add a bitfield enum which lists each property (of type ulong
to make sure we don't run out), and a (static?) method which will return which properties are in the group of a given property, and then developers could query this when they want to make sure they are getting a new enough value in Grouped
mode (and any other mode that may have special behaviour based on groups). I think I'm leaning towards this instead of deciding what the groups are and documenting them, in case we found a way to group more properties, or ungroup them into faster seperate calls in the future.
I would also like to note: the way I would implement it, these modes would all share most of the code (other than Snapshot
, which is inherently quite different) and would be easy to maintain.
Currently when we do Refresh() we get a process snapshot, which requires iterating through every process on the system (I think this is the only way to get a snapshot) and we extract all the information at once from it (which means all the info is consistent with itself). This is why it's so much slower.
This is definitely expensive. I believe that we should try a simpler (but not most optimal) approach first:
When processId
is known:
Don't fetch all processes information:
and then read all fields at once:
Just use corresponding sys-calls for getting the info for single process and assign all of the information at once:
This should be relatively easier to implement as we would not need to introduce a new state flag for "incomplete process information": https://github.com/dotnet/runtime/blob/68cf2474e2503a11e3db3f305f8a61b76e1fef3e/src/libraries/System.Diagnostics.Process/src/System/Diagnostics/Process.cs#L498
and hence avoid the need for cache invalidation (we all know how it usually ends ;) ) and even more differences between Windows and other OS-es implementation for `ProcessManager:.
This could work an order of magnitude faster and be simple (assuming there are no major windows 7 vs 7+ differences) and straightforward to implement, without breaking and risky changes.
@hamarb123 what do you think?
2 issues with this, but otherwise I think it is probably fine - but I'd still prefer to go the route with the different modes (would need to get the additional APIs required with this approved first obviously).
Firstly, would technically be a breaking change as stated here.
Secondly, it's unclear to me how the caching would work, currently you have to call Refresh()
to get current info. Are you suggesting we basically implement the Grouped
mode from above? I'd be fine with this. Or do you mean the Current
mode? This has the issue that grouped properties are requested brand new every time.
Also, the reason I say the different modes would be easy to maintain is because we could do it basically like this (names can be changed obviously, and below is just a rough sketch of how it would work):
static class
that gets the process info from native calls ProcessInfoGetter
- this would be the only part that is OS specific (other than saying an api is not supported). It would have 3 functions, one to get the sizes needed to store the output for the next function, one which takes a ref bitfield enum saying which field to read (which gets changed to which ones were read) and takes spans for the output, and one which says which fields are in a group. Exposing it like this allows us to avoid any unnecessary allocations.ProcessInfoGetter
class to get the info as required. These all share from the same base class.Snapshot
mode (has the same base class and interface).Process
class itself, we add a field to store the class for the current mode, and one to store the enum, and then we change all of the implementations for the properties to use the ones from there.I think this design would be extensible, maintainable, and fast.
Description
If someone wants to get an up-to-date value for
Process.PrivateMemorySize64
, then they have to callProcess.Refresh
. I noticed that this was really slow and I only wanted a value forPrivateMemorySize64
.Windows has an API called
GetProcessMemoryInfo
which we should use to get these value since it is much faster. We can open the process with theOpenProcess
API and store it in a safehandle & useCloseHandle
to close it.Configuration
.NET 6.0.9 x64 Windows 10 21H2 v10.0.19044.2006 x64
Regression?
No
Data
The following bits of code execute in a similar amount of time:
Which means we can do it ~5000x faster (for the speeds on my machine anyway).
Analysis
It seems like we retrieve info for every process every time we do anything (https://github.com/dotnet/runtime/blob/a5f3676cc71e176084f0f7f1f6beeecd86fbeafc/src/libraries/System.Diagnostics.Process/src/System/Diagnostics/ProcessManager.Windows.cs). We can probably also apply similar optimisations to the other methods - I'd also be interested in investigating the other functions and how they can be optimised once we get this one approved - I'd like to do the different functions as seperate PRs if I get approved to do it.
We should use
GetProcessMemoryInfo
for the info that can be gathered from it. I'd like to make a PR for this if it is approved by the area-owners which would be merged by .NET 8 ideally (please add me to assignees also on this and the PR if approved, thanks).