Use `CompactString` for `Identifier`

MichaReiser commented 2 days ago

This PR introduces a small string create for ExprName and Identifier to reduce the number of allocations.

moves the Name struct from red_knot_python_semantic to ruff_python_ast
Change ExprName and Identifier to store a Name instead of a String
Change Name to use CompactString which shows better performance than smol_str

Performance improvement

Less time spent allocating
Less time spent deallocating (drop). This especially evident in the linter benchmarks

This change should also reduce peak-memory usage.

Why `CompactString`

CompactString shows better performance in the read path than smol_str. The only disadvantage compared to smol_str is that smol_str supports O(1) cloning. I had to update the red-knot symbol table to store references to avoid allocating new strings (it actually already allocated new strings, but we could have removed those allocations when the AST stores smol_str).

codspeed-hq[bot] commented 2 days ago

CodSpeed Performance Report

Merging #12101 will improve performances by 7.54%

_{Comparing identifier-compact_str (43ea86c) with main (db6ee74)}

Summary

⚡ 6 improvements ✅ 24 untouched benchmarks

Benchmarks breakdown

	Benchmark	`main`	`identifier-compact_str`	Change
⚡	`lexer[large/dataset.py]`	1.1 ms	1.1 ms	+6.06%
⚡	`lexer[numpy/ctypeslib.py]`	227.2 µs	216.3 µs	+5.06%
⚡	`lexer[pydantic/types.py]`	506.2 µs	481.6 µs	+5.11%
⚡	`lexer[unicode/pypinyin.py]`	77.4 µs	74.2 µs	+4.3%
⚡	`linter/default-rules[pydantic/types.py]`	1.9 ms	1.8 ms	+7.54%
⚡	`parser[numpy/ctypeslib.py]`	953.7 µs	915 µs	+4.23%

github-actions[bot] commented 2 days ago

`ruff-ecosystem` results

Linter (stable)

✅ ecosystem check detected no linter changes.

Linter (preview)

ℹ️ ecosystem check detected linter changes. (+0 -1 violations, +0 -0 fixes in 1 projects; 1 project error; 48 projects unchanged)

python/typeshed (+0 -1 violations, +0 -0 fixes)

ruff check --no-cache --exit-zero --ignore RUF9 --output-format concise --preview --select E,F,FA,I,PYI,RUF,UP,W

- stdlib/_collections_abc.pyi:10:5: PYI057 Do not use `typing.ByteString`, which has unclear semantics and is deprecated

demisto/content (error)

ruff check --no-cache --exit-zero --ignore RUF9 --output-format concise --preview

``` warning: The top-level linter settings are deprecated in favour of their counterparts in the `lint` section. Please update the following options in `pyproject.toml`: - 'ignore' -> 'lint.ignore' - 'select' -> 'lint.select' - 'unfixable' -> 'lint.unfixable' - 'per-file-ignores' -> 'lint.per-file-ignores' warning: `PGH001` has been remapped to `S307`. warning: `PGH002` has been remapped to `G010`. warning: `PLR1701` has been remapped to `SIM101`. ruff failed Cause: Selection of deprecated rule `E999` is not allowed when preview is enabled. ```

Changes by rule (1 rules affected)

| code | total | + violation | - violation | + fix | - fix | | ---- | ------- | --------- | -------- | ----- | ---- | | PYI057 | 1 | 0 | 1 | 0 | 0 |

Formatter (stable)

ℹ️ ecosystem check encountered format errors. (no format changes; 1 project error)

openai/openai-cookbook (error)

``` warning: Detected debug build without --no-cache. error: Failed to parse examples/gpt_actions_library/.gpt_action_getting_started.ipynb:11:1:1: Expected an expression error: Failed to parse examples/gpt_actions_library/gpt_action_bigquery.ipynb:13:1:1: Expected an expression ```

Formatter (preview)

ℹ️ ecosystem check encountered format errors. (no format changes; 2 project errors)

demisto/content (error)

ruff format --preview --exclude Packs/ThreatQ/Integrations/ThreatQ/ThreatQ.py

``` warning: The top-level linter settings are deprecated in favour of their counterparts in the `lint` section. Please update the following options in `pyproject.toml`: - 'ignore' -> 'lint.ignore' - 'select' -> 'lint.select' - 'unfixable' -> 'lint.unfixable' - 'per-file-ignores' -> 'lint.per-file-ignores' warning: `PGH001` has been remapped to `S307`. warning: `PGH002` has been remapped to `G010`. warning: `PLR1701` has been remapped to `SIM101`. ruff failed Cause: Selection of deprecated rule `E999` is not allowed when preview is enabled. ```

openai/openai-cookbook (error)

ruff format --preview

``` warning: Detected debug build without --no-cache. error: Failed to parse examples/gpt_actions_library/.gpt_action_getting_started.ipynb:11:1:1: Expected an expression error: Failed to parse examples/gpt_actions_library/gpt_action_bigquery.ipynb:13:1:1: Expected an expression ```

charliermarsh commented 1 day ago

Nice!

charliermarsh commented 1 day ago

How did you determine that CompactString shows better results than smol_str? (It seems like a totally reasonable conclusion to me, just curious.)

MichaReiser commented 1 day ago

Sorry, I should have mentioned this in the PR description.

I first started by using smol_str because the O(1) cloning is nice, see https://github.com/astral-sh/ruff/pull/12099. The PR looked good at first because it significantly improved performance. But that was too good to be true and I realised that it mainly was because of an unnecessary allocation in parse_identifier. I also noticed that the lexer and parser benchmarks improved across the board, but that many linter benchmarks regressed, probably because accessing a string now required more branching. That's when I started to try out CompactString which showed better improvements in the Lexer and Parser benchmarks, without regressing the Linter benchmarks as much.

I rebased and reopened https://github.com/astral-sh/ruff/pull/12099 for a direct comparison.

astral-sh / ruff