astral-sh / ruff

An extremely fast Python linter and code formatter, written in Rust.
https://docs.astral.sh/ruff
MIT License
28.59k stars 926 forks source link

Use `CompactString` for `Identifier` #12101

Closed MichaReiser closed 6 hours ago

MichaReiser commented 2 days ago

This PR introduces a small string create for ExprName and Identifier to reduce the number of allocations.

Performance improvement

This change should also reduce peak-memory usage.

Why CompactString

CompactString shows better performance in the read path than smol_str. The only disadvantage compared to smol_str is that smol_str supports O(1) cloning. I had to update the red-knot symbol table to store references to avoid allocating new strings (it actually already allocated new strings, but we could have removed those allocations when the AST stores smol_str).

codspeed-hq[bot] commented 2 days ago

CodSpeed Performance Report

Merging #12101 will improve performances by 7.54%

Comparing identifier-compact_str (43ea86c) with main (db6ee74)

Summary

⚡ 6 improvements ✅ 24 untouched benchmarks

Benchmarks breakdown

Benchmark main identifier-compact_str Change
lexer[large/dataset.py] 1.1 ms 1.1 ms +6.06%
lexer[numpy/ctypeslib.py] 227.2 µs 216.3 µs +5.06%
lexer[pydantic/types.py] 506.2 µs 481.6 µs +5.11%
lexer[unicode/pypinyin.py] 77.4 µs 74.2 µs +4.3%
linter/default-rules[pydantic/types.py] 1.9 ms 1.8 ms +7.54%
parser[numpy/ctypeslib.py] 953.7 µs 915 µs +4.23%
github-actions[bot] commented 2 days ago

ruff-ecosystem results

Linter (stable)

✅ ecosystem check detected no linter changes.

Linter (preview)

ℹ️ ecosystem check detected linter changes. (+0 -1 violations, +0 -0 fixes in 1 projects; 1 project error; 48 projects unchanged)

python/typeshed (+0 -1 violations, +0 -0 fixes)

ruff check --no-cache --exit-zero --ignore RUF9 --output-format concise --preview --select E,F,FA,I,PYI,RUF,UP,W

- stdlib/_collections_abc.pyi:10:5: PYI057 Do not use `typing.ByteString`, which has unclear semantics and is deprecated

demisto/content (error)

ruff check --no-cache --exit-zero --ignore RUF9 --output-format concise --preview

``` warning: The top-level linter settings are deprecated in favour of their counterparts in the `lint` section. Please update the following options in `pyproject.toml`: - 'ignore' -> 'lint.ignore' - 'select' -> 'lint.select' - 'unfixable' -> 'lint.unfixable' - 'per-file-ignores' -> 'lint.per-file-ignores' warning: `PGH001` has been remapped to `S307`. warning: `PGH002` has been remapped to `G010`. warning: `PLR1701` has been remapped to `SIM101`. ruff failed Cause: Selection of deprecated rule `E999` is not allowed when preview is enabled. ```

Changes by rule (1 rules affected)

| code | total | + violation | - violation | + fix | - fix | | ---- | ------- | --------- | -------- | ----- | ---- | | PYI057 | 1 | 0 | 1 | 0 | 0 |

Formatter (stable)

ℹ️ ecosystem check encountered format errors. (no format changes; 1 project error)

openai/openai-cookbook (error)

``` warning: Detected debug build without --no-cache. error: Failed to parse examples/gpt_actions_library/.gpt_action_getting_started.ipynb:11:1:1: Expected an expression error: Failed to parse examples/gpt_actions_library/gpt_action_bigquery.ipynb:13:1:1: Expected an expression ```

Formatter (preview)

ℹ️ ecosystem check encountered format errors. (no format changes; 2 project errors)

demisto/content (error)

ruff format --preview --exclude Packs/ThreatQ/Integrations/ThreatQ/ThreatQ.py

``` warning: The top-level linter settings are deprecated in favour of their counterparts in the `lint` section. Please update the following options in `pyproject.toml`: - 'ignore' -> 'lint.ignore' - 'select' -> 'lint.select' - 'unfixable' -> 'lint.unfixable' - 'per-file-ignores' -> 'lint.per-file-ignores' warning: `PGH001` has been remapped to `S307`. warning: `PGH002` has been remapped to `G010`. warning: `PLR1701` has been remapped to `SIM101`. ruff failed Cause: Selection of deprecated rule `E999` is not allowed when preview is enabled. ```

openai/openai-cookbook (error)

ruff format --preview

``` warning: Detected debug build without --no-cache. error: Failed to parse examples/gpt_actions_library/.gpt_action_getting_started.ipynb:11:1:1: Expected an expression error: Failed to parse examples/gpt_actions_library/gpt_action_bigquery.ipynb:13:1:1: Expected an expression ```

charliermarsh commented 1 day ago

Nice!

charliermarsh commented 1 day ago

How did you determine that CompactString shows better results than smol_str? (It seems like a totally reasonable conclusion to me, just curious.)

MichaReiser commented 1 day ago

Sorry, I should have mentioned this in the PR description.

I first started by using smol_str because the O(1) cloning is nice, see https://github.com/astral-sh/ruff/pull/12099. The PR looked good at first because it significantly improved performance. But that was too good to be true and I realised that it mainly was because of an unnecessary allocation in parse_identifier. I also noticed that the lexer and parser benchmarks improved across the board, but that many linter benchmarks regressed, probably because accessing a string now required more branching. That's when I started to try out CompactString which showed better improvements in the Lexer and Parser benchmarks, without regressing the Linter benchmarks as much.

I rebased and reopened https://github.com/astral-sh/ruff/pull/12099 for a direct comparison.