Open konstin opened 1 year ago
I think the counterargument is that CPython uses this representation:
>>> print(ast.dump(ast.parse(s)))
Module(body=[Import(names=[alias(name='foo.bar')])], type_ignores=[])
And the ASDL uses the same identifier
symbol as in other nodes: https://docs.python.org/3/library/ast.html.
But I care more about the ergonomics and performance than I do exact compatibility with CPython on these decisions. Would need to see what this makes easier or harder.
@charliermarsh Is my understanding correct that CPython normalizes the identifier name (removes the whitespace)?
I was a bit surprised when I saw this representation first because it's somewhat uncommon (at least in the languages that I have used thus far)—especially considering that it re-joins the identifier tokens that the lexer identified.
Are there any upsides in the semantic model to have a single string? I would expect it to be easier to have the individual parts when e.g. resolving imports. The alternative is that we implement a components
method similar to Rust's Path::components
that returns the individual parts (splitting by string).
Yeah it's probably an improvement to store a list of dot-separated segments. There is likely no upside in the semantic model since we always decompose into segments.
Let us fix this, regardless of how it gets implemented. Splitting the names in the formatter would be silly, but something we could do.
I think we're at least lucky that the following is not valid
import (a # comment
.b)
I think this actually was fixed, we format it correctly: https://play.ruff.rs/86d1a181-4d27-4bbe-ad13-0c425e1976c0. I think this issue was about changing the AST to better reflect the real structure.
Currently, the path of imports is not formatted, e.g.
remains as-is. This is due to a bug in our AST:
https://github.com/astral-sh/ruff/blob/6824b67f44d8d462f83a727ed8caf100d10c22a6/crates/ruff_python_ast/src/imports.rs#L6-L31
The entire path is represented as a single string, even though it should be dot-separated identifier (the parser calls it
DottedName
, but then emits an identifier), especially since identifier can not contain dots.Vec<Identifier>
or something similarIdentifier
is only used for strings matching the rules