Open neonene opened 1 year ago
3.10
seems fine withv.1933
.
This is interesting.
Does this mean that the team working on improvements have regressed on Windows (in ways specific to the latest compiler fixes)?
Yeah, we haven't been able to run Windows benchmarks ourselves.
Which MSVC build are you using for the py.org builds, Steve?
The latest public version - version number is embedded in sys.version: '3.10.7 (tags/v3.10.7:6cc6b13, Sep 5 2022, 14:08:36) [MSC v.1933 64 bit (AMD64)]'
I typically update my build machine before doing releases. My experience with MSVC has been that either you pin its version and work around bugs as you find them, or you assume that any bugfix is worth taking and stick to the tip. It's not viable to only update sometimes, as any issues are going to be really hard to discover.
So now it would be useful to be able to run benchmarks for Windows, to confirm neonene's report.
3.11
and3.12
local PGO builds withMSC v.1933
are 6%~ slower than the PGO withv.1929
(vs2019) on my pyperformance macro benchmark.3.11rc2+ (2022-9-18 8e2bda8)
``` > pyperf compare_to v1929 v1933 -G Slower (51): - unpack_sequence: 81.7 ns +- 0.8 ns -> 103 ns +- 5 ns: 1.26x slower - scimark_monte_carlo: 100 ms +- 1 ms -> 123 ms +- 1 ms: 1.23x slower - nbody: 163 ms +- 2 ms -> 193 ms +- 2 ms: 1.19x slower - sqlalchemy_imperative: 46.6 ms +- 1.3 ms -> 53.3 ms +- 8.9 ms: 1.14x slower - scimark_sparse_mat_mult: 6.34 ms +- 0.06 ms -> 7.22 ms +- 0.09 ms: 1.14x slower - logging_simple: 22.7 us +- 0.5 us -> 25.7 us +- 1.8 us: 1.13x slower - scimark_lu: 157 ms +- 2 ms -> 177 ms +- 2 ms: 1.13x slower - sympy_integrate: 36.4 ms +- 0.4 ms -> 41.0 ms +- 4.2 ms: 1.13x slower - sympy_expand: 877 ms +- 14 ms -> 987 ms +- 65 ms: 1.13x slower - scimark_sor: 187 ms +- 3 ms -> 211 ms +- 1 ms: 1.12x slower - float: 138 ms +- 1 ms -> 153 ms +- 2 ms: 1.11x slower - pickle_pure_python: 574 us +- 7 us -> 632 us +- 10 us: 1.10x slower - go: 246 ms +- 3 ms -> 271 ms +- 3 ms: 1.10x slower - nqueens: 162 ms +- 2 ms -> 177 ms +- 1 ms: 1.10x slower - scimark_fft: 492 ms +- 6 ms -> 539 ms +- 6 ms: 1.10x slower - sympy_sum: 318 ms +- 10 ms -> 348 ms +- 28 ms: 1.09x slower - fannkuch: 615 ms +- 9 ms -> 672 ms +- 11 ms: 1.09x slower - unpickle_pure_python: 392 us +- 4 us -> 427 us +- 8 us: 1.09x slower - logging_silent: 177 ns +- 3 ns -> 193 ns +- 2 ns: 1.09x slower - regex_compile: 236 ms +- 2 ms -> 256 ms +- 4 ms: 1.09x slower - html5lib: 117 ms +- 12 ms -> 127 ms +- 12 ms: 1.08x slower - richards: 83.9 ms +- 0.9 ms -> 90.8 ms +- 0.7 ms: 1.08x slower - deltablue: 7.05 ms +- 0.13 ms -> 7.59 ms +- 0.11 ms: 1.08x slower - sqlalchemy_declarative: 264 ms +- 8 ms -> 283 ms +- 14 ms: 1.07x slower - pyflate: 807 ms +- 12 ms -> 863 ms +- 8 ms: 1.07x slower - sympy_str: 543 ms +- 13 ms -> 578 ms +- 14 ms: 1.06x slower - chaos: 137 ms +- 1 ms -> 146 ms +- 1 ms: 1.06x slower - xml_etree_process: 107 ms +- 2 ms -> 114 ms +- 2 ms: 1.06x slower - meteor_contest: 156 ms +- 3 ms -> 164 ms +- 5 ms: 1.05x slower - django_template: 87.2 ms +- 1.6 ms -> 91.5 ms +- 1.6 ms: 1.05x slower - crypto_pyaes: 123 ms +- 2 ms -> 129 ms +- 1 ms: 1.05x slower - chameleon: 15.9 ms +- 0.3 ms -> 16.6 ms +- 0.3 ms: 1.05x slower - logging_format: 24.7 us +- 0.8 us -> 25.8 us +- 0.4 us: 1.05x slower - telco: 11.7 ms +- 0.1 ms -> 12.2 ms +- 0.1 ms: 1.04x slower - spectral_norm: 173 ms +- 1 ms -> 180 ms +- 1 ms: 1.04x slower - regex_effbot: 4.13 ms +- 0.04 ms -> 4.29 ms +- 0.05 ms: 1.04x slower - raytrace: 566 ms +- 5 ms -> 589 ms +- 6 ms: 1.04x slower - pickle: 21.0 us +- 0.2 us -> 21.8 us +- 0.6 us: 1.04x slower - hexiom: 11.0 ms +- 0.1 ms -> 11.4 ms +- 0.1 ms: 1.04x slower - regex_v8: 31.4 ms +- 0.2 ms -> 32.5 ms +- 0.3 ms: 1.04x slower - 2to3: 593 ms +- 2 ms -> 612 ms +- 9 ms: 1.03x slower - xml_etree_generate: 152 ms +- 5 ms -> 156 ms +- 5 ms: 1.03x slower - mako: 20.5 ms +- 0.3 ms -> 21.0 ms +- 0.4 ms: 1.03x slower - tornado_http: 345 ms +- 21 ms -> 353 ms +- 23 ms: 1.02x slower - pickle_list: 7.40 us +- 0.11 us -> 7.56 us +- 0.05 us: 1.02x slower - unpickle: 23.1 us +- 0.5 us -> 23.6 us +- 0.4 us: 1.02x slower - xml_etree_iterparse: 197 ms +- 10 ms -> 200 ms +- 6 ms: 1.02x slower - regex_dna: 255 ms +- 2 ms -> 259 ms +- 2 ms: 1.01x slower - dulwich_log: 244 ms +- 8 ms -> 248 ms +- 3 ms: 1.01x slower - json_dumps: 22.7 ms +- 0.3 ms -> 22.9 ms +- 0.4 ms: 1.01x slower - pickle_dict: 46.3 us +- 0.5 us -> 46.6 us +- 0.3 us: 1.01x slower Faster (4): - unpickle_list: 7.94 us +- 0.33 us -> 7.05 us +- 0.09 us: 1.13x faster - pathlib: 341 ms +- 42 ms -> 314 ms +- 37 ms: 1.09x faster - sqlite_synth: 5.67 us +- 0.06 us -> 5.61 us +- 0.12 us: 1.01x faster - pidigits: 269 ms +- 2 ms -> 268 ms +- 1 ms: 1.01x faster Benchmark hidden because not significant (2): json_loads, xml_etree_parse Geometric mean: 1.06x slower ```And python.org official binaries have 8% gap between 3.11rc2(
v.1933
) and 3.11rc1 (v.1932
). (A few percent of the slowdown is necessary from bugfix commits)At one glance, function inlinining in
_PyEval_EvalFrameDefault()
looks OK. Build logs showed only the following difference:8816 of 79666 inline instances were from dead/cold paths
6591 of 77031 inline instances were from dead/cold paths
3.10
seems fine withv.1933
. I'm not sure if this applies to everyone.cc: @zooba