Closed MichaReiser closed 6 hours ago
Comparing identifier-compact_str
(43ea86c) with main
(db6ee74)
⚡ 6
improvements
✅ 24
untouched benchmarks
Benchmark | main |
identifier-compact_str |
Change | |
---|---|---|---|---|
⚡ | lexer[large/dataset.py] |
1.1 ms | 1.1 ms | +6.06% |
⚡ | lexer[numpy/ctypeslib.py] |
227.2 µs | 216.3 µs | +5.06% |
⚡ | lexer[pydantic/types.py] |
506.2 µs | 481.6 µs | +5.11% |
⚡ | lexer[unicode/pypinyin.py] |
77.4 µs | 74.2 µs | +4.3% |
⚡ | linter/default-rules[pydantic/types.py] |
1.9 ms | 1.8 ms | +7.54% |
⚡ | parser[numpy/ctypeslib.py] |
953.7 µs | 915 µs | +4.23% |
ruff-ecosystem
results✅ ecosystem check detected no linter changes.
ℹ️ ecosystem check detected linter changes. (+0 -1 violations, +0 -0 fixes in 1 projects; 1 project error; 48 projects unchanged)
ruff check --no-cache --exit-zero --ignore RUF9 --output-format concise --preview --select E,F,FA,I,PYI,RUF,UP,W
- stdlib/_collections_abc.pyi:10:5: PYI057 Do not use `typing.ByteString`, which has unclear semantics and is deprecated
ruff check --no-cache --exit-zero --ignore RUF9 --output-format concise --preview
``` warning: The top-level linter settings are deprecated in favour of their counterparts in the `lint` section. Please update the following options in `pyproject.toml`: - 'ignore' -> 'lint.ignore' - 'select' -> 'lint.select' - 'unfixable' -> 'lint.unfixable' - 'per-file-ignores' -> 'lint.per-file-ignores' warning: `PGH001` has been remapped to `S307`. warning: `PGH002` has been remapped to `G010`. warning: `PLR1701` has been remapped to `SIM101`. ruff failed Cause: Selection of deprecated rule `E999` is not allowed when preview is enabled. ```
| code | total | + violation | - violation | + fix | - fix | | ---- | ------- | --------- | -------- | ----- | ---- | | PYI057 | 1 | 0 | 1 | 0 | 0 |
ℹ️ ecosystem check encountered format errors. (no format changes; 1 project error)
``` warning: Detected debug build without --no-cache. error: Failed to parse examples/gpt_actions_library/.gpt_action_getting_started.ipynb:11:1:1: Expected an expression error: Failed to parse examples/gpt_actions_library/gpt_action_bigquery.ipynb:13:1:1: Expected an expression ```
ℹ️ ecosystem check encountered format errors. (no format changes; 2 project errors)
ruff format --preview --exclude Packs/ThreatQ/Integrations/ThreatQ/ThreatQ.py
``` warning: The top-level linter settings are deprecated in favour of their counterparts in the `lint` section. Please update the following options in `pyproject.toml`: - 'ignore' -> 'lint.ignore' - 'select' -> 'lint.select' - 'unfixable' -> 'lint.unfixable' - 'per-file-ignores' -> 'lint.per-file-ignores' warning: `PGH001` has been remapped to `S307`. warning: `PGH002` has been remapped to `G010`. warning: `PLR1701` has been remapped to `SIM101`. ruff failed Cause: Selection of deprecated rule `E999` is not allowed when preview is enabled. ```
ruff format --preview
``` warning: Detected debug build without --no-cache. error: Failed to parse examples/gpt_actions_library/.gpt_action_getting_started.ipynb:11:1:1: Expected an expression error: Failed to parse examples/gpt_actions_library/gpt_action_bigquery.ipynb:13:1:1: Expected an expression ```
Nice!
How did you determine that CompactString
shows better results than smol_str
? (It seems like a totally reasonable conclusion to me, just curious.)
Sorry, I should have mentioned this in the PR description.
I first started by using smol_str
because the O(1)
cloning is nice, see https://github.com/astral-sh/ruff/pull/12099. The PR looked good at first because it significantly improved performance. But that was too good to be true and I realised that it mainly was because of an unnecessary allocation in parse_identifier
. I also noticed that the lexer and parser benchmarks improved across the board, but that many linter benchmarks regressed, probably because accessing a string now required more branching. That's when I started to try out CompactString
which showed better improvements in the Lexer and Parser benchmarks, without regressing the Linter benchmarks as much.
I rebased and reopened https://github.com/astral-sh/ruff/pull/12099 for a direct comparison.
This PR introduces a small string create for
ExprName
andIdentifier
to reduce the number of allocations.Name
struct fromred_knot_python_semantic
toruff_python_ast
ExprName
andIdentifier
to store aName
instead of aString
Name
to useCompactString
which shows better performance thansmol_str
Performance improvement
This change should also reduce peak-memory usage.
Why
CompactString
CompactString
shows better performance in the read path thansmol_str
. The only disadvantage compared tosmol_str
is thatsmol_str
supportsO(1)
cloning.I had to update the red-knot symbol table to store references to avoid allocating new strings (it actually already allocated new strings, but we could have removed those allocations when the AST stores.smol_str
)