PowerShell / PowerShell

PowerShell for every system!
https://microsoft.com/PowerShell
MIT License
43.55k stars 7.06k forks source link

Can `Get-Process | Export-Csv` be faster? #21607

Closed yg-i closed 1 week ago

yg-i commented 2 weeks ago

Summary of the new feature / enhancement

On my Win10 machine (PSVersion 7.4.2; OS version 10.0.19045), Get-Process | Export-Csv -path "o.csv" takes a whopping 36 seconds for 400+ processes. Moreover, as the linear relationship exhibited below shows, this isn't due to slowness in reading any particular process's information: apparently it really just takes a lot of time. I'm wondering if there's any room of dramatic (?!) speed improvement in this department? Tools like Process Explorer/Hacker seem capable of showing a large number of detailed process information almost instantaneously.

(The same speed issue is present for Export-CliXML etc.)

Many thanks,

PS C:\> (Measure-Command { Get-Process | Select-Object -First 5 | Export-Csv -Path "o.csv" }).TotalMilliseconds
376.7013

PS C:\> (Measure-Command { Get-Process | Select-Object -First 10 | Export-Csv -Path "o.csv" }).TotalMilliseconds
652.332

PS C:\> (Measure-Command { Get-Process | Select-Object -First 15 | Export-Csv -Path "o.csv" }).TotalMilliseconds
966.9475

PS C:\> (Measure-Command { Get-Process | Select-Object -First 20 | Export-Csv -Path "o.csv" }).TotalMilliseconds
1243.4443

PS C:\> (Measure-Command { Get-Process | Select-Object -First 25 | Export-Csv -Path "o.csv" }).TotalMilliseconds
1565.5073

Proposed technical implementation details (optional)

No response

rhubarb-geek-nz commented 2 weeks ago

Compare with

pwsh -c "Get-Process" >o.csv

which takes less than a second.

Now compare that with the information written with

 Get-Process | ConvertTo-JSON  | Set-Content -path "o.json"

And now it takes for ages because for each process it tries to get detailed information including what DLLs are loaded into each process, threads in the process, memory usage etc..

Compare that with

get-process | foreach-object { Write-Output $_.Id , $_.MainModule.ModuleName }

Which is less than a second again.

So in summary, be careful what you ask for. Get-Process is quick until you start inquiring for detailed information for each process, which Export-CSV and ConvertTo-JSON will do because they are generic.

So try

get-process | Select-Object -Property Id,MainModule | Export-Csv -LiteralPath o.csv

Which takes less than a second again

However

get-process | Select-Object -Property Id,CommandLine | Export-Csv -LiteralPath o.csv

Takes for ages, so ask for what you need.

yg-i commented 2 weeks ago

So in summary, be careful what you ask for. Get-Process is quick until you start inquiring for detailed information for each process, which Export-CSV and ConvertTo-JSON will do because they are generic.

Thanks, that's helpful. It seems that there's a subset of approximately 10 properties, out of a total of 69, that especially slow down the read process (the 'Module' and 'Parent' properties are the slowest among them). When these 10 properties are excluded, the operation completes in just a second or two. Including all 10 them causes the total execution time to skyrocket to about 200 seconds!

image

PS C:\tmp> $allProps = (gps | gm -MemberType properties).Name;
PS C:\tmp> $slowProps = ('Modules', 'MainModule', 'Parent', 'CommandLine', 'Company', 'Path', 'Description', 'Product', 'ProductVersion', 'FileVersion');
PS C:\tmp> (measure-command {gps | select -Property ($allProps | Where-Object {$_ -notIn ($slowProps | select -SkipLast 0)}) | Export-Clixml o.xml}).TotalSeconds
1.7832752
PS C:\tmp> (measure-command {gps | select -Property ($allProps | Where-Object {$_ -notIn ($slowProps | select -SkipLast 1)}) | Export-Clixml o.xml}).TotalSeconds
3.719695
PS C:\tmp> (measure-command {gps | select -Property ($allProps | Where-Object {$_ -notIn ($slowProps | select -SkipLast 2)}) | Export-Clixml o.xml}).TotalSeconds
5.3285927
PS C:\tmp> (measure-command {gps | select -Property ($allProps | Where-Object {$_ -notIn ($slowProps | select -SkipLast 3)}) | Export-Clixml o.xml}).TotalSeconds
7.072522
PS C:\tmp> (measure-command {gps | select -Property ($allProps | Where-Object {$_ -notIn ($slowProps | select -SkipLast 4)}) | Export-Clixml o.xml}).TotalSeconds
8.9156198
PS C:\tmp> (measure-command {gps | select -Property ($allProps | Where-Object {$_ -notIn ($slowProps | select -SkipLast 5)}) | Export-Clixml o.xml}).TotalSeconds
10.5895199
PS C:\tmp> (measure-command {gps | select -Property ($allProps | Where-Object {$_ -notIn ($slowProps | select -SkipLast 6)}) | Export-Clixml o.xml}).TotalSeconds
12.3948109
PS C:\tmp> (measure-command {gps | select -Property ($allProps | Where-Object {$_ -notIn ($slowProps | select -SkipLast 7)}) | Export-Clixml o.xml}).TotalSeconds
35.2305779
PS C:\tmp> (measure-command {gps | select -Property ($allProps | Where-Object {$_ -notIn ($slowProps | select -SkipLast 8)}) | Export-Clixml o.xml}).TotalSeconds
91.9951524
PS C:\tmp> (measure-command {gps | select -Property ($allProps | Where-Object {$_ -notIn ($slowProps | select -SkipLast 9)}) | Export-Clixml o.xml}).TotalSeconds
92.1810648
PS C:\tmp> (measure-command {gps | select -Property ($allProps | Where-Object {$_ -notIn ($slowProps | select -SkipLast 10)}) | Export-Clixml o.xml}).TotalSeconds
206.8004993
237dmitry commented 1 week ago

In my opinion it works pretty fast:

$ Get-Process | Export-Csv -path "o.csv"
$ (Get-History -Count 1).Duration.TotalMilliseconds   
411.0004

$ (Import-Csv ./o.csv | Get-Member -MemberType NoteProperty).Count
69

$ (Get-Content ./o.csv).Count
203

$ (Get-Item ./o.csv).Size
117463

7.5.0-preview.1 on Linux.

rhubarb-geek-nz commented 1 week ago

In my opinion it works pretty fast: 7.5.0-preview.1 on Linux.

Yes, this is not the same as the original posters environment on Win10 machine (PSVersion 7.4.2; OS version 10.0.19045),

Process management on Windows is completely different to that on Linux, both conceptually and the API used to retrieve the information.

237dmitry commented 1 week ago

this is not the same as the original posters environment

I thought about this aspect. But I can’t test on Windows (replaced it with Linux). I posted it just for comparison and the overall picture.

SteveL-MSFT commented 1 week ago

@rhubarb-geek-nz provided the right solution which is to only serialize what you actually need. Getting the .NET object is fast because some properties are just references that aren't retrieved until actually used like during serialization.

microsoft-github-policy-service[bot] commented 1 week ago

This issue has been marked as answered and has not had any activity for 1 day. It has been closed for housekeeping purposes.

microsoft-github-policy-service[bot] commented 1 week ago

📣 Hey @yg-i, how did we do? We would love to hear your feedback with the link below! 🗣️

🔗 https://aka.ms/PSRepoFeedback

Microsoft Forms