dotnet / perf-autofiling-issues

A landing place for auto-filed performance issues before they receive triage
MIT License
9 stars 4 forks source link

[Perf] Linux/x64: 49 Regressions on 3/7/2024 2:55:42 PM #30899

Open performanceautofiler[bot] opened 4 months ago

performanceautofiler[bot] commented 4 months ago

Run Information

Name Value
Architecture x64
OS ubuntu 22.04
Queue TigerUbuntu
Baseline 8330db998659c4e6410aba370b37e4304a517a2b
Compare c806bf697035ee47589e246ea6f6453811d6cd40
Diff Diff
Configs CompilationMode:wasm, RunKind:micro

Regressions in System.Tests.Perf_String

Benchmark Baseline Test Test/Base Test Quality Edge Detector Baseline IR Compare IR IR Ratio
102.40 ns 120.22 ns 1.17 0.25 False
105.91 ns 125.72 ns 1.19 0.25 False
136.31 ns 154.74 ns 1.14 0.22 False
154.36 ns 193.55 ns 1.25 0.19 False
167.54 ns 188.93 ns 1.13 0.23 False
169.54 ns 194.51 ns 1.15 0.24 False

graph graph graph graph graph graph Test Report

Repro

General Docs link: https://github.com/dotnet/performance/blob/main/docs/benchmarking-workflow-dotnet-runtime.md

git clone https://github.com/dotnet/performance.git
python3 .\performance\scripts\benchmarks_ci.py -f net8.0 --filter 'System.Tests.Perf_String*'
### Payloads [Baseline]() [Compare]() ### System.Tests.Perf_String.Substring_IntInt(s: "dzsdzsDDZSDZSDZSddsz", i1: 10, i2: 1) #### ETL Files #### Histogram #### JIT Disasms ### System.Tests.Perf_String.Substring_IntInt(s: "dzsdzsDDZSDZSDZSddsz", i1: 0, i2: 8) #### ETL Files #### Histogram #### JIT Disasms ### System.Tests.Perf_String.Remove_Int(s: "dzsdzsDDZSDZSDZSddsz", i: 10) #### ETL Files #### Histogram #### JIT Disasms ### System.Tests.Perf_String.TrimEnd(s: "Test ") #### ETL Files #### Histogram #### JIT Disasms ### System.Tests.Perf_String.Trim_CharArr(s: " Test", c: [' ', ' ']) #### ETL Files #### Histogram #### JIT Disasms ### System.Tests.Perf_String.TrimEnd_CharArr(s: "Test ", c: [' ', ' ']) #### ETL Files #### Histogram #### JIT Disasms ### Docs [Profiling workflow for dotnet/runtime repository](https://github.com/dotnet/performance/blob/master/docs/profiling-workflow-dotnet-runtime.md) [Benchmarking workflow for dotnet/runtime repository](https://github.com/dotnet/performance/blob/master/docs/benchmarking-workflow-dotnet-runtime.md)

Run Information

Name Value
Architecture x64
OS ubuntu 22.04
Queue TigerUbuntu
Baseline 8330db998659c4e6410aba370b37e4304a517a2b
Compare c806bf697035ee47589e246ea6f6453811d6cd40
Diff Diff
Configs CompilationMode:wasm, RunKind:micro

Regressions in System.Numerics.Tensors.Tests.Perf_FloatingPointTensorPrimitives<Double>

Benchmark Baseline Test Test/Base Test Quality Edge Detector Baseline IR Compare IR IR Ratio
2.06 ms 2.38 ms 1.15 0.27 True
73.55 μs 85.90 μs 1.17 0.32 True
219.68 μs 268.27 μs 1.22 0.44 True
339.31 μs 362.29 μs 1.07 0.06 True
136.79 μs 151.91 μs 1.11 0.31 False
207.34 μs 263.50 μs 1.27 0.50 False
83.87 μs 103.78 μs 1.24 0.24 True
1.28 ms 1.56 ms 1.22 0.24 False
6.12 μs 6.96 μs 1.14 0.16 True
4.94 ms 6.78 ms 1.37 0.47 True
54.34 μs 65.55 μs 1.21 0.29 False
88.07 μs 104.81 μs 1.19 0.42 False
310.93 μs 345.99 μs 1.11 0.19 True
2.09 ms 2.58 ms 1.23 0.35 True
2.30 ms 2.74 ms 1.19 0.49 False
5.53 ms 6.44 ms 1.17 0.41 True
4.92 ms 7.09 ms 1.44 0.39 True
217.74 μs 276.85 μs 1.27 0.52 True
12.71 μs 14.45 μs 1.14 0.21 False
98.42 μs 115.08 μs 1.17 0.36 False
1.87 ms 2.19 ms 1.17 0.29 True

graph graph graph graph graph graph graph graph graph graph graph graph graph graph graph graph graph graph graph graph graph Test Report

Repro

General Docs link: https://github.com/dotnet/performance/blob/main/docs/benchmarking-workflow-dotnet-runtime.md

git clone https://github.com/dotnet/performance.git
python3 .\performance\scripts\benchmarks_ci.py -f net8.0 --filter 'System.Numerics.Tensors.Tests.Perf_FloatingPointTensorPrimitives&lt;Double&gt;*'
### Payloads [Baseline]() [Compare]() ### System.Numerics.Tensors.Tests.Perf_FloatingPointTensorPrimitives<Double>.Sigmoid(BufferLength: 3079) #### ETL Files #### Histogram #### JIT Disasms ### System.Numerics.Tensors.Tests.Perf_FloatingPointTensorPrimitives<Double>.Exp(BufferLength: 128) #### ETL Files #### Histogram #### JIT Disasms ### System.Numerics.Tensors.Tests.Perf_FloatingPointTensorPrimitives<Double>.Pow_Vectors(BufferLength: 128) #### ETL Files #### Histogram #### JIT Disasms ### System.Numerics.Tensors.Tests.Perf_FloatingPointTensorPrimitives<Double>.Ieee754Remainder_ScalarDivisor(BufferLength: 3079) #### ETL Files #### Histogram #### JIT Disasms ### System.Numerics.Tensors.Tests.Perf_FloatingPointTensorPrimitives<Double>.Distance(BufferLength: 3079) #### ETL Files #### Histogram #### JIT Disasms ### System.Numerics.Tensors.Tests.Perf_FloatingPointTensorPrimitives<Double>.Pow_ScalarBase(BufferLength: 128) #### ETL Files #### Histogram #### JIT Disasms ### System.Numerics.Tensors.Tests.Perf_FloatingPointTensorPrimitives<Double>.Sigmoid(BufferLength: 128) #### ETL Files #### Histogram #### JIT Disasms ### System.Numerics.Tensors.Tests.Perf_FloatingPointTensorPrimitives<Double>.Sin(BufferLength: 3079) #### ETL Files #### Histogram #### JIT Disasms ### System.Numerics.Tensors.Tests.Perf_FloatingPointTensorPrimitives<Double>.Distance(BufferLength: 128) #### ETL Files #### Histogram #### JIT Disasms ### System.Numerics.Tensors.Tests.Perf_FloatingPointTensorPrimitives<Double>.Pow_Vectors(BufferLength: 3079) #### ETL Files #### Histogram #### JIT Disasms ### System.Numerics.Tensors.Tests.Perf_FloatingPointTensorPrimitives<Double>.Sin(BufferLength: 128) #### ETL Files #### Histogram #### JIT Disasms ### System.Numerics.Tensors.Tests.Perf_FloatingPointTensorPrimitives<Double>.Sinh(BufferLength: 128) #### ETL Files #### Histogram #### JIT Disasms ### System.Numerics.Tensors.Tests.Perf_FloatingPointTensorPrimitives<Double>.Ieee754Remainder_ScalarDividend(BufferLength: 3079) #### ETL Files #### Histogram #### JIT Disasms ### System.Numerics.Tensors.Tests.Perf_FloatingPointTensorPrimitives<Double>.Sinh(BufferLength: 3079) #### ETL Files #### Histogram #### JIT Disasms ### System.Numerics.Tensors.Tests.Perf_FloatingPointTensorPrimitives<Double>.Log(BufferLength: 3079) #### ETL Files #### Histogram #### JIT Disasms ### System.Numerics.Tensors.Tests.Perf_FloatingPointTensorPrimitives<Double>.Pow_ScalarExponent(BufferLength: 3079) #### ETL Files #### Histogram #### JIT Disasms ### System.Numerics.Tensors.Tests.Perf_FloatingPointTensorPrimitives<Double>.Pow_ScalarBase(BufferLength: 3079) #### ETL Files #### Histogram #### JIT Disasms ### System.Numerics.Tensors.Tests.Perf_FloatingPointTensorPrimitives<Double>.Pow_ScalarExponent(BufferLength: 128) #### ETL Files #### Histogram #### JIT Disasms ### System.Numerics.Tensors.Tests.Perf_FloatingPointTensorPrimitives<Double>.Ieee754Remainder_ScalarDividend(BufferLength: 128) #### ETL Files #### Histogram #### JIT Disasms ### System.Numerics.Tensors.Tests.Perf_FloatingPointTensorPrimitives<Double>.Log(BufferLength: 128) #### ETL Files #### Histogram #### JIT Disasms ### System.Numerics.Tensors.Tests.Perf_FloatingPointTensorPrimitives<Double>.Exp(BufferLength: 3079) #### ETL Files #### Histogram #### JIT Disasms ### Docs [Profiling workflow for dotnet/runtime repository](https://github.com/dotnet/performance/blob/master/docs/profiling-workflow-dotnet-runtime.md) [Benchmarking workflow for dotnet/runtime repository](https://github.com/dotnet/performance/blob/master/docs/benchmarking-workflow-dotnet-runtime.md)
performanceautofiler[bot] commented 4 months ago

Run Information

Name Value
Architecture x64
OS ubuntu 22.04
Queue TigerUbuntu
Baseline 8330db998659c4e6410aba370b37e4304a517a2b
Compare c806bf697035ee47589e246ea6f6453811d6cd40
Diff Diff
Configs CompilationMode:wasm, RunKind:micro

Regressions in System.Numerics.Tensors.Tests.Perf_NumberTensorPrimitives<Double>

Benchmark Baseline Test Test/Base Test Quality Edge Detector Baseline IR Compare IR IR Ratio
1.83 μs 2.25 μs 1.23 0.24 False
2.80 μs 3.49 μs 1.25 0.13 True
32.00 μs 36.39 μs 1.14 0.26 False
3.14 μs 3.54 μs 1.13 0.24 False
1.66 μs 1.77 μs 1.07 0.21 False
3.14 μs 3.62 μs 1.15 0.06 True
1.64 μs 1.72 μs 1.05 0.24 False
3.57 μs 3.99 μs 1.12 0.18 False

graph graph graph graph graph graph graph graph Test Report

Repro

General Docs link: https://github.com/dotnet/performance/blob/main/docs/benchmarking-workflow-dotnet-runtime.md

git clone https://github.com/dotnet/performance.git
python3 .\performance\scripts\benchmarks_ci.py -f net8.0 --filter 'System.Numerics.Tensors.Tests.Perf_NumberTensorPrimitives&lt;Double&gt;*'
### Payloads [Baseline]() [Compare]() ### System.Numerics.Tensors.Tests.Perf_NumberTensorPrimitives<Double>.Negate(BufferLength: 128) #### ETL Files #### Histogram #### JIT Disasms ### System.Numerics.Tensors.Tests.Perf_NumberTensorPrimitives<Double>.SumOfMagnitudes(BufferLength: 128) #### ETL Files #### Histogram #### JIT Disasms ### System.Numerics.Tensors.Tests.Perf_NumberTensorPrimitives<Double>.Add_Scalar(BufferLength: 3079) #### ETL Files #### Histogram #### JIT Disasms ### System.Numerics.Tensors.Tests.Perf_NumberTensorPrimitives<Double>.SumOfSquares(BufferLength: 128) #### ETL Files #### Histogram #### JIT Disasms ### System.Numerics.Tensors.Tests.Perf_NumberTensorPrimitives<Double>.Divide_Scalar(BufferLength: 128) #### ETL Files #### Histogram #### JIT Disasms ### System.Numerics.Tensors.Tests.Perf_NumberTensorPrimitives<Double>.AddMultiply_ScalarAddend(BufferLength: 128) #### ETL Files #### Histogram #### JIT Disasms ### System.Numerics.Tensors.Tests.Perf_NumberTensorPrimitives<Double>.Add_Scalar(BufferLength: 128) #### ETL Files #### Histogram #### JIT Disasms ### System.Numerics.Tensors.Tests.Perf_NumberTensorPrimitives<Double>.AddMultiply_Vectors(BufferLength: 128) #### ETL Files #### Histogram #### JIT Disasms ### Docs [Profiling workflow for dotnet/runtime repository](https://github.com/dotnet/performance/blob/master/docs/profiling-workflow-dotnet-runtime.md) [Benchmarking workflow for dotnet/runtime repository](https://github.com/dotnet/performance/blob/master/docs/benchmarking-workflow-dotnet-runtime.md)

Run Information

Name Value
Architecture x64
OS ubuntu 22.04
Queue TigerUbuntu
Baseline 8330db998659c4e6410aba370b37e4304a517a2b
Compare c806bf697035ee47589e246ea6f6453811d6cd40
Diff Diff
Configs CompilationMode:wasm, RunKind:micro

Regressions in System.Numerics.Tensors.Tests.Perf_FloatingPointTensorPrimitives<Single>

Benchmark Baseline Test Test/Base Test Quality Edge Detector Baseline IR Compare IR IR Ratio
37.33 μs 41.93 μs 1.12 0.44 False
13.53 μs 14.58 μs 1.08 0.08 False
944.38 μs 1.16 ms 1.23 0.32 False
37.94 μs 48.29 μs 1.27 0.42 False
62.23 μs 70.05 μs 1.13 0.47 False
37.58 μs 50.85 μs 1.35 0.42 False
942.43 μs 1.11 ms 1.18 0.48 False
1.52 ms 1.65 ms 1.09 0.37 False

graph graph graph graph graph graph graph graph Test Report

Repro

General Docs link: https://github.com/dotnet/performance/blob/main/docs/benchmarking-workflow-dotnet-runtime.md

git clone https://github.com/dotnet/performance.git
python3 .\performance\scripts\benchmarks_ci.py -f net8.0 --filter 'System.Numerics.Tensors.Tests.Perf_FloatingPointTensorPrimitives&lt;Single&gt;*'
### Payloads [Baseline]() [Compare]() ### System.Numerics.Tensors.Tests.Perf_FloatingPointTensorPrimitives<Single>.Exp(BufferLength: 128) #### ETL Files #### Histogram #### JIT Disasms ### System.Numerics.Tensors.Tests.Perf_FloatingPointTensorPrimitives<Single>.Ieee754Remainder_ScalarDivisor(BufferLength: 128) #### ETL Files #### Histogram #### JIT Disasms ### System.Numerics.Tensors.Tests.Perf_FloatingPointTensorPrimitives<Single>.Sinh(BufferLength: 3079) #### ETL Files #### Histogram #### JIT Disasms ### System.Numerics.Tensors.Tests.Perf_FloatingPointTensorPrimitives<Single>.Sinh(BufferLength: 128) #### ETL Files #### Histogram #### JIT Disasms ### System.Numerics.Tensors.Tests.Perf_FloatingPointTensorPrimitives<Single>.Pow_ScalarBase(BufferLength: 128) #### ETL Files #### Histogram #### JIT Disasms ### System.Numerics.Tensors.Tests.Perf_FloatingPointTensorPrimitives<Single>.Sigmoid(BufferLength: 128) #### ETL Files #### Histogram #### JIT Disasms ### System.Numerics.Tensors.Tests.Perf_FloatingPointTensorPrimitives<Single>.Sigmoid(BufferLength: 3079) #### ETL Files #### Histogram #### JIT Disasms ### System.Numerics.Tensors.Tests.Perf_FloatingPointTensorPrimitives<Single>.Pow_ScalarExponent(BufferLength: 3079) #### ETL Files #### Histogram #### JIT Disasms ### Docs [Profiling workflow for dotnet/runtime repository](https://github.com/dotnet/performance/blob/master/docs/profiling-workflow-dotnet-runtime.md) [Benchmarking workflow for dotnet/runtime repository](https://github.com/dotnet/performance/blob/master/docs/benchmarking-workflow-dotnet-runtime.md)

Run Information

Name Value
Architecture x64
OS ubuntu 22.04
Queue TigerUbuntu
Baseline 8330db998659c4e6410aba370b37e4304a517a2b
Compare c806bf697035ee47589e246ea6f6453811d6cd40
Diff Diff
Configs CompilationMode:wasm, RunKind:micro

Regressions in Struct.GSeq

Benchmark Baseline Test Test/Base Test Quality Edge Detector Baseline IR Compare IR IR Ratio
684.52 μs 787.38 μs 1.15 0.18 False

graph Test Report

Repro

General Docs link: https://github.com/dotnet/performance/blob/main/docs/benchmarking-workflow-dotnet-runtime.md

git clone https://github.com/dotnet/performance.git
python3 .\performance\scripts\benchmarks_ci.py -f net8.0 --filter 'Struct.GSeq*'
### Payloads [Baseline]() [Compare]() ### Struct.GSeq.FilterSkipMapSum #### ETL Files #### Histogram #### JIT Disasms ### Docs [Profiling workflow for dotnet/runtime repository](https://github.com/dotnet/performance/blob/master/docs/profiling-workflow-dotnet-runtime.md) [Benchmarking workflow for dotnet/runtime repository](https://github.com/dotnet/performance/blob/master/docs/benchmarking-workflow-dotnet-runtime.md)

Run Information

Name Value
Architecture x64
OS ubuntu 22.04
Queue TigerUbuntu
Baseline 8330db998659c4e6410aba370b37e4304a517a2b
Compare c806bf697035ee47589e246ea6f6453811d6cd40
Diff Diff
Configs CompilationMode:wasm, RunKind:micro

Regressions in System.Memory.Span<Char>

Benchmark Baseline Test Test/Base Test Quality Edge Detector Baseline IR Compare IR IR Ratio
297.37 ns 321.47 ns 1.08 0.10 False

graph Test Report

Repro

General Docs link: https://github.com/dotnet/performance/blob/main/docs/benchmarking-workflow-dotnet-runtime.md

git clone https://github.com/dotnet/performance.git
python3 .\performance\scripts\benchmarks_ci.py -f net8.0 --filter 'System.Memory.Span&lt;Char&gt;*'
### Payloads [Baseline]() [Compare]() ### System.Memory.Span<Char>.SequenceEqual(Size: 512) #### ETL Files #### Histogram #### JIT Disasms ### Docs [Profiling workflow for dotnet/runtime repository](https://github.com/dotnet/performance/blob/master/docs/profiling-workflow-dotnet-runtime.md) [Benchmarking workflow for dotnet/runtime repository](https://github.com/dotnet/performance/blob/master/docs/benchmarking-workflow-dotnet-runtime.md)

Run Information

Name Value
Architecture x64
OS ubuntu 22.04
Queue TigerUbuntu
Baseline 8330db998659c4e6410aba370b37e4304a517a2b
Compare c806bf697035ee47589e246ea6f6453811d6cd40
Diff Diff
Configs CompilationMode:wasm, RunKind:micro

Regressions in System.Collections.CtorGivenSize<String>

Benchmark Baseline Test Test/Base Test Quality Edge Detector Baseline IR Compare IR IR Ratio
215.89 ns 241.29 ns 1.12 0.12 False

graph Test Report

Repro

General Docs link: https://github.com/dotnet/performance/blob/main/docs/benchmarking-workflow-dotnet-runtime.md

git clone https://github.com/dotnet/performance.git
python3 .\performance\scripts\benchmarks_ci.py -f net8.0 --filter 'System.Collections.CtorGivenSize&lt;String&gt;*'
### Payloads [Baseline]() [Compare]() ### System.Collections.CtorGivenSize<String>.List(Size: 512) #### ETL Files #### Histogram #### JIT Disasms ### Docs [Profiling workflow for dotnet/runtime repository](https://github.com/dotnet/performance/blob/master/docs/profiling-workflow-dotnet-runtime.md) [Benchmarking workflow for dotnet/runtime repository](https://github.com/dotnet/performance/blob/master/docs/benchmarking-workflow-dotnet-runtime.md)

Run Information

Name Value
Architecture x64
OS ubuntu 22.04
Queue TigerUbuntu
Baseline 8330db998659c4e6410aba370b37e4304a517a2b
Compare c806bf697035ee47589e246ea6f6453811d6cd40
Diff Diff
Configs CompilationMode:wasm, RunKind:micro

Regressions in System.Collections.CtorFromCollectionNonGeneric<Int32>

Benchmark Baseline Test Test/Base Test Quality Edge Detector Baseline IR Compare IR IR Ratio
154.21 μs 178.11 μs 1.16 0.07 False

graph Test Report

Repro

General Docs link: https://github.com/dotnet/performance/blob/main/docs/benchmarking-workflow-dotnet-runtime.md

git clone https://github.com/dotnet/performance.git
python3 .\performance\scripts\benchmarks_ci.py -f net8.0 --filter 'System.Collections.CtorFromCollectionNonGeneric&lt;Int32&gt;*'
### Payloads [Baseline]() [Compare]() ### System.Collections.CtorFromCollectionNonGeneric<Int32>.ArrayList(Size: 512) #### ETL Files #### Histogram #### JIT Disasms ### Docs [Profiling workflow for dotnet/runtime repository](https://github.com/dotnet/performance/blob/master/docs/profiling-workflow-dotnet-runtime.md) [Benchmarking workflow for dotnet/runtime repository](https://github.com/dotnet/performance/blob/master/docs/benchmarking-workflow-dotnet-runtime.md)

Run Information

Name Value
Architecture x64
OS ubuntu 22.04
Queue TigerUbuntu
Baseline 8330db998659c4e6410aba370b37e4304a517a2b
Compare c806bf697035ee47589e246ea6f6453811d6cd40
Diff Diff
Configs CompilationMode:wasm, RunKind:micro

Regressions in SciMark2.kernel

Benchmark Baseline Test Test/Base Test Quality Edge Detector Baseline IR Compare IR IR Ratio
2.02 secs 2.17 secs 1.08 0.01 False

graph Test Report

Repro

General Docs link: https://github.com/dotnet/performance/blob/main/docs/benchmarking-workflow-dotnet-runtime.md

git clone https://github.com/dotnet/performance.git
python3 .\performance\scripts\benchmarks_ci.py -f net8.0 --filter 'SciMark2.kernel*'
### Payloads [Baseline]() [Compare]() ### SciMark2.kernel.benchSOR #### ETL Files #### Histogram #### JIT Disasms ### Docs [Profiling workflow for dotnet/runtime repository](https://github.com/dotnet/performance/blob/master/docs/profiling-workflow-dotnet-runtime.md) [Benchmarking workflow for dotnet/runtime repository](https://github.com/dotnet/performance/blob/master/docs/benchmarking-workflow-dotnet-runtime.md)

Run Information

Name Value
Architecture x64
OS ubuntu 22.04
Queue TigerUbuntu
Baseline 8330db998659c4e6410aba370b37e4304a517a2b
Compare c806bf697035ee47589e246ea6f6453811d6cd40
Diff Diff
Configs CompilationMode:wasm, RunKind:micro

Regressions in System.Text.Json.Tests.Perf_Guids

Benchmark Baseline Test Test/Base Test Quality Edge Detector Baseline IR Compare IR IR Ratio
25.62 ms 28.98 ms 1.13 0.15 False

graph Test Report

Repro

General Docs link: https://github.com/dotnet/performance/blob/main/docs/benchmarking-workflow-dotnet-runtime.md

git clone https://github.com/dotnet/performance.git
python3 .\performance\scripts\benchmarks_ci.py -f net8.0 --filter 'System.Text.Json.Tests.Perf_Guids*'
### Payloads [Baseline]() [Compare]() ### System.Text.Json.Tests.Perf_Guids.WriteGuids(Formatted: False, SkipValidation: True) #### ETL Files #### Histogram #### JIT Disasms ### Docs [Profiling workflow for dotnet/runtime repository](https://github.com/dotnet/performance/blob/master/docs/profiling-workflow-dotnet-runtime.md) [Benchmarking workflow for dotnet/runtime repository](https://github.com/dotnet/performance/blob/master/docs/benchmarking-workflow-dotnet-runtime.md)
kg commented 4 months ago

Likely https://github.com/dotnet/runtime/pull/99273. The size of the regression makes me think it's probably because the old (broken) heuristic was not inserting traces at the "right" places, and the traces it's inserting for these scenarios aren't profitable. May investigate the worst ones.

EDIT: Most of these regressions seem to be in the tensor code, which looks to be extremely generic code that wraps vector operators, and we also know as an existing thing that

  1. we don't implement most of the vector ops on wasm interp yet, and may never implement them all
  2. perf for scalar fallback on vector ops regressed recently (this may improve soon)
kg commented 4 months ago

The Trim and Substring ones look like they were probably impacted by the changes that introduced more safepoints into jiterpreter traces, so the recent fix to remove safepoints may make those go away. Looking at a quick profile they seem to spend half their execution time in traces, though a good chunk of that is dominated by time spent allocating strings: image Looking at the opcodes for the traces in question there are quite a few imm safepoint branch opcodes in there which were introduced recently by an interpreter optimization. The jiterp should now handle those opcodes in a more efficient way.

radekdoulik commented 4 months ago

the vector performance indeed regressed recently, the firefox impact is a bit higher than chrome's one. https://radekdoulik.github.io/WasmPerformanceMeasurements/?startDate=2024-02-27T23%3A01%3A28.000Z&endDate=2024-03-12T22%3A53%3A01.000Z&tasks=%2CVector&flavors=2%2C3%2C14%2C15

image
kg commented 4 months ago

I'm hoping https://github.com/dotnet/runtime/pull/99706 will claw back a lot of the vector perf if it lands, though it's impossible for me to measure locally (too much noise)