Closed youknowone closed 1 year ago
I'd put in a slight request to make the crate also support Python 2. Bonus points if it can map onto the same Visitor
trait.
We have a use case for parsing Python 2 and Python 3.5+ scraping the same information from either, so one crate handling both languages in an identical (maybe even the exact same) way would be amazing.
@thejcannon Thank you for the comment! Your demand seems showing better why we need a new repository. Aggregating more different use case to it will make the module more organized.
OK, I'd love to contribute to this, but I don't think I'm the best person to own it. So how about:
FooKind
enums to use those.Oh and for naming, I suggest python-parser
so we don't close ourselves out of Python2
I think it will be started with a hard-copy or git-subtree. It will be created under https://github.com/RustPython/parser (or python-parser?) It will include roughly 4 crates - rustpython-common, rustpython-parser, rustpython-ast and rustpython-compiler-core not to create cross-dependency between crates. I think I need to work on rustpython-common a bit to resolve its ambiguous boundary.
At the start of splitting point, there will be no difference between current rustpyhton-parser and the new repository. Once the dependency stuff is done, I think the forking itself can be done very quickly.
So, please don't hesitate to make any change to our current parser. There will be no difference and it will takes time due to preparing.
I'm going to close https://github.com/RustPython/RustPython/pull/4930 and just wait for the new crate. Nothing is pushing it forward other than my own use-case, which I can get from my fork.
And that means we can get it right the first time when it's ready.
👍 No objection from me. I suspect this would benefit Ruff too.
We do depend on a few things in rustpython_common
: rustpython_common::cformat
, rustpython_common::format
, rustpython_common::float_ops
, and rustpython_common::char
, so it'd be nice if those made it over too.
My two cents: it would be reasonable to retain the name rustpython_parser
(and equivalent for other crates) even if the parser is intended to be used outside of RustPython, since these crates originate from RustPython and are part of the RustPython organization + effort.
That will be a point. Ruff is using not only parser but other utilities too. It seems creating a clean single module for Ruff looks unnatural. I am not sure what will the other parts' result be. Formatting stuff? Maybe we can defer the better design of it later once those and Ruff and those part of RustPython be more stablized later.
Still it looks better. Comparably, parser more frequently changes while common doesn't. Ruff will going to use frequently updating new repository for parser and relatively stable other components common. It will let Ruff to skip most of RustPython versions.
I don't think it changes much for ruff whether the parser is part of the RustPython or a standalone repository. Either way, it's a dependency that we have limited control over, and contributing to it requires switching to another project. What's important to us is that it can be used standalone.
Moving the parser out of RustPython and promoting it as a standalone parser may introduce a new tension: How do you plan to balance a stable API vs. progressing the parser? Explaining this as part of a versioning policy may be important.
I'm bringing this up because we have some ideas on how to improve Ruff's parser eventually. Some of these improvements will probably benefit everyone but are breaking changes from an API standpoint:
u64
or f64
where doing so doesn't result in a loss of precision and fall back to a Box<BigInt>
for the rare cases where a BigInt
is necessary (I expect it to be extremely rare to find BigInt
constants in the source code). String
, BigInt
etc, and re-parse them on demand (that's what the ZIG compiler and Rome does) .Other changes are most likely only useful to Ruff or other static-analysis tools:
We don't have any specific plans on if, when or which of these features we want to pursue. I wanted to bring this up for you to decide how you want to position the RustPython
parser and give you a heads up that ruff may, eventually, decide to roll their own parser (because the AST is a core infrastructure).
I don't have specific plan about features, but the key idea is separating parser and other parts of RustPython.
astral-sh:main
couldn't be merged into upstream because it breaks RustPython build. If the parser respository is separated, we can enhance parser first and let RustPython follows it later.
I'd like to share the new repositories' ownership with major stakeholders. I hope it can helps the concern about limited controls. Currently RustPython is upstream and other stakeholders as downstream. With the new repository, everybody will be downstream. We will have more motivation to have fine-grained dependencies and features as downstream.
I hope we just can enhance our poor ast design, but adding a customizable ast feature also could be one of the answer.
for BigInt we are adapting it to malachite witch has small-int optimize internally.
I hope we just can enhance our poor ast design, but adding a customizable ast feature also could be one of the answer.
I'm sorry if that's what came across. I only wanted to say that different tools may have different requirements and may, therefore, decide to model the AST differently. RustPython's AST isn't poorly designed and is a great fit for many tools.
I'd like to share the new repositories' ownership with major stakeholders. I hope it can helps the concern about limited controls. Currently RustPython is upstream and other stakeholders as downstream. With the new repository, everybody will be downstream. We will have more motivation to have fine-grained dependencies and features as downstream.
That makes sense to me and would be beneficial for Ruff :)
What do you think about CPython API compatibility? This has come up in one of my PRs and is not really a concern for Ruff.
I didn't look them in deep yet. But the rough idea is rust-level ast doesn't necessarily are same as python-level ast. Our parser and (rust) ast can have more information then what (c)python ast has. And it actually does already a bit. RustPython can make python compatibility on python side or rust-python transition level. I hope I have more time to look in it tomorrow.
So I started playing around with the tree sitter rust binding and I think we might switch to that.
The interface isn't as easy to use (have to use their query language) but it supports Python 2, and keeps the door open for us to reuse the architecture for JVM languages and others
I'll keep my visitor changes around if you want so we can apply them to the new parser crate(s)
@thejcannon Thank you so much! That will be very helpful 😄 I am planning to work on this issue soon.
format/cformat seems reasonable to be placed out of common crate.
Currently they are depending on BorrowedStr
which is tightly coupled with vm::PyStr. It will require a bit of design works engineering to decouple them from rustpython-vm implementation details.
New repository is created: https://github.com/RustPython/Parser
I created a new team @RustPython/parser and invited @charliermarsh
@charliermarsh please feel free to add any qualified parser contributors. maybe @MichaReiser ?
Thank you everyone. Though there are still remaining subtle issues, the core split is done!
Summary
Recently, rustpython-parser has definitely a lot more users than RustPython itself. I suggest splitting parser out from RustPython repository.
Detailed Explanation
When upstream is comparably smaller than downstream, downstream goes to make fragmentation rather than contributing to upstream.
The problem here is the user expectation and the project structure has opposite ratio.
In the project RustPython, parser is comparably small module - mostly due to huge stdlib of Python. For rustpython-parser users, only rustpyhon-parser is useful but other parts is useless burden.
I'd like to suggest splitting parser out from RustPython repository. it will be one of the good way to keep rust-implemented python parser in single repository with minimum fragmentation.
Drawbacks, Rationale, and Alternatives
RustPython will use the separated parser with git tag. It will increase cost of maintenance of other parts of RustPython.
Unresolved Questions
Not to restrict the parser's potent to just RustPython, the name even can be something different.
python-parser
orpython3-parser
.