dotnet / runtime

.NET is a cross-platform runtime for cloud, mobile, desktop, and IoT apps.
https://docs.microsoft.com/dotnet/core/
MIT License
15.38k stars 4.75k forks source link

System.Threading.Tasks.Tests.Perf_AsyncMethods.Yield regressed on ARM64 #66837

Open adamsitnik opened 2 years ago

adamsitnik commented 2 years ago

System.Threading.Tasks.Tests.Perf_AsyncMethods.Yield seems to be quite noisy, but it has for sure regressed on ARM64.

The reporting system does not show it for Windows arm64 but I am able to constantly reproduce it on Surface Pro X. So it can be caused by something that is enabled in the SDK, but not with corerun (perf lab runs use corerun from local dotnet/runtime build, we are using the SDK that we ship for the monthly perf runs)

image

Surprisingly for Ubuntu arm64 the reporting system shows an improvement. But this time I've not received any Ubuntu arm64 inputs, so I can't confirm or deny it.

image

@AndyAyersMS you should be able to reproduce it on your M1

Repro:

git clone https://github.com/dotnet/performance.git
py .\performance\scripts\benchmarks_ci.py -f net6.0 net7.0 --filter System.Threading.Tasks.Tests.Perf_AsyncMethods.Yield --architecture arm64
| Result | Base | Diff | Ratio | Modality | Operating System | Bit | | ------ | -------:| ------:| -----:| -------- | --------------------- | ----- | | Faster | 1295.71 | 520.60 | 2.49 | | Windows 11 | X64 | | Faster | 990.40 | 328.88 | 3.01 | | Windows 11 | X64 | | Same | 403.16 | 368.82 | 1.09 | | Windows 11 | X64 | | Faster | 588.08 | 294.85 | 1.99 | | Windows 10 | X64 | | Faster | 855.19 | 352.81 | 2.42 | | Windows 11 | X64 | | Slower | 273.40 | 345.52 | 0.79 | bimodal | Windows 11 | X64 | | Faster | 1057.37 | 336.21 | 3.14 | | ubuntu 18.04 | X64 | | Faster | 1048.20 | 347.24 | 3.02 | | ubuntu 20.04 | X64 | | Faster | 851.95 | 362.10 | 2.35 | | ubuntu 18.04 | X64 | | Same | 551.51 | 575.17 | 0.96 | | ubuntu 18.04 | X64 | | Slower | 314.53 | 370.69 | 0.85 | | pop 20.04 | X64 | | Same | 333.26 | 318.63 | 1.05 | several? | alpine 3.13 | X64 | | Same | 310.12 | 320.67 | 0.97 | | debian 11 | X64 | | Slower | 157.43 | 391.50 | 0.40 | | macOS Monterey 12.2.1 | Arm64 | | Slower | 450.64 | 876.74 | 0.51 | bimodal | Windows 10 | Arm64 | | Slower | 326.61 | 822.06 | 0.40 | | Windows 11 | Arm64 | | Same | 441.79 | 441.64 | 1.00 | several? | Windows 10 | X86 | | Same | 310.05 | 350.47 | 0.88 | bimodal | Windows 10 | X86 | | Same | 370.33 | 348.84 | 1.06 | | Windows 10 | X86 | | Slower | 609.16 | 999.41 | 0.61 | | Windows 10 | Arm | | Slower | 262.75 | 312.75 | 0.84 | bimodal | macOS Big Sur 11.6.3 | X64 | | Slower | 278.15 | 329.83 | 0.84 | | macOS Monterey 12.2.1 | X64 | | Same | 254.11 | 275.46 | 0.92 | | macOS Monterey 12.2.1 | X64 |
ghost commented 2 years ago

Tagging subscribers to this area: @dotnet/area-system-threading-tasks See info in area-owners.md if you want to be subscribed.

Issue Details
`System.Threading.Tasks.Tests.Perf_AsyncMethods.Yield ` seems to be quite noisy, but it has for sure regressed on ARM64. The reporting system does not show it for [Windows arm64](https://pvscmdupload.blob.core.windows.net/reports/allTestHistory%2frefs%2fheads%2fmain_arm64_Windows%2010.0.19041%2fSystem.Threading.Tasks.Tests.Perf_AsyncMethods.Yield.html) but I am able to constantly reproduce it on Surface Pro X. So it can be caused by something that is enabled in the SDK, but not with corerun (perf lab runs use corerun from local dotnet/runtime build, we are using the SDK that we ship for the monthly perf runs) ![image](https://user-images.githubusercontent.com/6011991/159037367-adeaa972-d9b9-4856-9782-52cb25341fe0.png) Surprisingly for [Ubuntu arm64](https://pvscmdupload.blob.core.windows.net/reports/allTestHistory%2frefs%2fheads%2fmain_arm64_ubuntu%2018.04%2fSystem.Threading.Tasks.Tests.Perf_AsyncMethods.Yield.html) the reporting system shows an improvement. But this time I've not received any Ubuntu arm64 inputs, so I can't confirm or deny it. ![image](https://user-images.githubusercontent.com/6011991/159037756-817ddd0a-f4fd-49d2-9a18-b1b0d12f9a97.png) @AndyAyersMS you should be able to reproduce it on your M1 Repro: ```cmd git clone https://github.com/dotnet/performance.git py .\performance\scripts\benchmarks_ci.py -f net6.0 net7.0 --filter System.Threading.Tasks.Tests.Perf_AsyncMethods.Yield --architecture arm64 ```
| Result | Base | Diff | Ratio | Modality | Operating System | Bit | | ------ | -------:| ------:| -----:| -------- | --------------------- | ----- | | Faster | 1295.71 | 520.60 | 2.49 | | Windows 11 | X64 | | Faster | 990.40 | 328.88 | 3.01 | | Windows 11 | X64 | | Same | 403.16 | 368.82 | 1.09 | | Windows 11 | X64 | | Faster | 588.08 | 294.85 | 1.99 | | Windows 10 | X64 | | Faster | 855.19 | 352.81 | 2.42 | | Windows 11 | X64 | | Slower | 273.40 | 345.52 | 0.79 | bimodal | Windows 11 | X64 | | Faster | 1057.37 | 336.21 | 3.14 | | ubuntu 18.04 | X64 | | Faster | 1048.20 | 347.24 | 3.02 | | ubuntu 20.04 | X64 | | Faster | 851.95 | 362.10 | 2.35 | | ubuntu 18.04 | X64 | | Same | 551.51 | 575.17 | 0.96 | | ubuntu 18.04 | X64 | | Slower | 314.53 | 370.69 | 0.85 | | pop 20.04 | X64 | | Same | 333.26 | 318.63 | 1.05 | several? | alpine 3.13 | X64 | | Same | 310.12 | 320.67 | 0.97 | | debian 11 | X64 | | Slower | 157.43 | 391.50 | 0.40 | | macOS Monterey 12.2.1 | Arm64 | | Slower | 450.64 | 876.74 | 0.51 | bimodal | Windows 10 | Arm64 | | Slower | 326.61 | 822.06 | 0.40 | | Windows 11 | Arm64 | | Same | 441.79 | 441.64 | 1.00 | several? | Windows 10 | X86 | | Same | 310.05 | 350.47 | 0.88 | bimodal | Windows 10 | X86 | | Same | 370.33 | 348.84 | 1.06 | | Windows 10 | X86 | | Slower | 609.16 | 999.41 | 0.61 | | Windows 10 | Arm | | Slower | 262.75 | 312.75 | 0.84 | bimodal | macOS Big Sur 11.6.3 | X64 | | Slower | 278.15 | 329.83 | 0.84 | | macOS Monterey 12.2.1 | X64 | | Same | 254.11 | 275.46 | 0.92 | | macOS Monterey 12.2.1 | X64 |
Author: adamsitnik
Assignees: -
Labels: `arch-arm64`, `area-System.Threading.Tasks`, `tenet-performance`
Milestone: -
stephentoub commented 2 years ago

All await Task.Yield() does is queue a work item to the ThreadPool, so if there's a regression here, it's almost certainly around the ThreadPool. cc: @kouvel

ghost commented 2 years ago

Tagging subscribers to this area: @mangod9 See info in area-owners.md if you want to be subscribed.

Issue Details
`System.Threading.Tasks.Tests.Perf_AsyncMethods.Yield ` seems to be quite noisy, but it has for sure regressed on ARM64. The reporting system does not show it for [Windows arm64](https://pvscmdupload.blob.core.windows.net/reports/allTestHistory%2frefs%2fheads%2fmain_arm64_Windows%2010.0.19041%2fSystem.Threading.Tasks.Tests.Perf_AsyncMethods.Yield.html) but I am able to constantly reproduce it on Surface Pro X. So it can be caused by something that is enabled in the SDK, but not with corerun (perf lab runs use corerun from local dotnet/runtime build, we are using the SDK that we ship for the monthly perf runs) ![image](https://user-images.githubusercontent.com/6011991/159037367-adeaa972-d9b9-4856-9782-52cb25341fe0.png) Surprisingly for [Ubuntu arm64](https://pvscmdupload.blob.core.windows.net/reports/allTestHistory%2frefs%2fheads%2fmain_arm64_ubuntu%2018.04%2fSystem.Threading.Tasks.Tests.Perf_AsyncMethods.Yield.html) the reporting system shows an improvement. But this time I've not received any Ubuntu arm64 inputs, so I can't confirm or deny it. ![image](https://user-images.githubusercontent.com/6011991/159037756-817ddd0a-f4fd-49d2-9a18-b1b0d12f9a97.png) @AndyAyersMS you should be able to reproduce it on your M1 Repro: ```cmd git clone https://github.com/dotnet/performance.git py .\performance\scripts\benchmarks_ci.py -f net6.0 net7.0 --filter System.Threading.Tasks.Tests.Perf_AsyncMethods.Yield --architecture arm64 ```
| Result | Base | Diff | Ratio | Modality | Operating System | Bit | | ------ | -------:| ------:| -----:| -------- | --------------------- | ----- | | Faster | 1295.71 | 520.60 | 2.49 | | Windows 11 | X64 | | Faster | 990.40 | 328.88 | 3.01 | | Windows 11 | X64 | | Same | 403.16 | 368.82 | 1.09 | | Windows 11 | X64 | | Faster | 588.08 | 294.85 | 1.99 | | Windows 10 | X64 | | Faster | 855.19 | 352.81 | 2.42 | | Windows 11 | X64 | | Slower | 273.40 | 345.52 | 0.79 | bimodal | Windows 11 | X64 | | Faster | 1057.37 | 336.21 | 3.14 | | ubuntu 18.04 | X64 | | Faster | 1048.20 | 347.24 | 3.02 | | ubuntu 20.04 | X64 | | Faster | 851.95 | 362.10 | 2.35 | | ubuntu 18.04 | X64 | | Same | 551.51 | 575.17 | 0.96 | | ubuntu 18.04 | X64 | | Slower | 314.53 | 370.69 | 0.85 | | pop 20.04 | X64 | | Same | 333.26 | 318.63 | 1.05 | several? | alpine 3.13 | X64 | | Same | 310.12 | 320.67 | 0.97 | | debian 11 | X64 | | Slower | 157.43 | 391.50 | 0.40 | | macOS Monterey 12.2.1 | Arm64 | | Slower | 450.64 | 876.74 | 0.51 | bimodal | Windows 10 | Arm64 | | Slower | 326.61 | 822.06 | 0.40 | | Windows 11 | Arm64 | | Same | 441.79 | 441.64 | 1.00 | several? | Windows 10 | X86 | | Same | 310.05 | 350.47 | 0.88 | bimodal | Windows 10 | X86 | | Same | 370.33 | 348.84 | 1.06 | | Windows 10 | X86 | | Slower | 609.16 | 999.41 | 0.61 | | Windows 10 | Arm | | Slower | 262.75 | 312.75 | 0.84 | bimodal | macOS Big Sur 11.6.3 | X64 | | Slower | 278.15 | 329.83 | 0.84 | | macOS Monterey 12.2.1 | X64 | | Same | 254.11 | 275.46 | 0.92 | | macOS Monterey 12.2.1 | X64 |
Author: adamsitnik
Assignees: -
Labels: `arch-arm64`, `area-System.Threading`, `tenet-performance`
Milestone: -