RustPython / RustPython

A Python Interpreter written in Rust
https://rustpython.github.io
MIT License
18.06k stars 1.2k forks source link

[RFC] Splitting parser out of RustPython repository #4941

Closed youknowone closed 1 year ago

youknowone commented 1 year ago

Summary

Recently, rustpython-parser has definitely a lot more users than RustPython itself. I suggest splitting parser out from RustPython repository.

Detailed Explanation

When upstream is comparably smaller than downstream, downstream goes to make fragmentation rather than contributing to upstream.

The problem here is the user expectation and the project structure has opposite ratio.

In the project RustPython, parser is comparably small module - mostly due to huge stdlib of Python. For rustpython-parser users, only rustpyhon-parser is useful but other parts is useless burden.

I'd like to suggest splitting parser out from RustPython repository. it will be one of the good way to keep rust-implemented python parser in single repository with minimum fragmentation.

Drawbacks, Rationale, and Alternatives

RustPython will use the separated parser with git tag. It will increase cost of maintenance of other parts of RustPython.

Unresolved Questions

Not to restrict the parser's potent to just RustPython, the name even can be something different. python-parser or python3-parser.

thejcannon commented 1 year ago

I'd put in a slight request to make the crate also support Python 2. Bonus points if it can map onto the same Visitor trait.

We have a use case for parsing Python 2 and Python 3.5+ scraping the same information from either, so one crate handling both languages in an identical (maybe even the exact same) way would be amazing.

youknowone commented 1 year ago

@thejcannon Thank you for the comment! Your demand seems showing better why we need a new repository. Aggregating more different use case to it will make the module more organized.

thejcannon commented 1 year ago

OK, I'd love to contribute to this, but I don't think I'm the best person to own it. So how about:

thejcannon commented 1 year ago

Oh and for naming, I suggest python-parser so we don't close ourselves out of Python2

youknowone commented 1 year ago

I think it will be started with a hard-copy or git-subtree. It will be created under https://github.com/RustPython/parser (or python-parser?) It will include roughly 4 crates - rustpython-common, rustpython-parser, rustpython-ast and rustpython-compiler-core not to create cross-dependency between crates. I think I need to work on rustpython-common a bit to resolve its ambiguous boundary.

At the start of splitting point, there will be no difference between current rustpyhton-parser and the new repository. Once the dependency stuff is done, I think the forking itself can be done very quickly.

So, please don't hesitate to make any change to our current parser. There will be no difference and it will takes time due to preparing.

thejcannon commented 1 year ago

I'm going to close https://github.com/RustPython/RustPython/pull/4930 and just wait for the new crate. Nothing is pushing it forward other than my own use-case, which I can get from my fork.

And that means we can get it right the first time when it's ready.

charliermarsh commented 1 year ago

👍 No objection from me. I suspect this would benefit Ruff too.

We do depend on a few things in rustpython_common: rustpython_common::cformat, rustpython_common::format, rustpython_common::float_ops, and rustpython_common::char, so it'd be nice if those made it over too.

My two cents: it would be reasonable to retain the name rustpython_parser (and equivalent for other crates) even if the parser is intended to be used outside of RustPython, since these crates originate from RustPython and are part of the RustPython organization + effort.

youknowone commented 1 year ago

That will be a point. Ruff is using not only parser but other utilities too. It seems creating a clean single module for Ruff looks unnatural. I am not sure what will the other parts' result be. Formatting stuff? Maybe we can defer the better design of it later once those and Ruff and those part of RustPython be more stablized later.

Still it looks better. Comparably, parser more frequently changes while common doesn't. Ruff will going to use frequently updating new repository for parser and relatively stable other components common. It will let Ruff to skip most of RustPython versions.

MichaReiser commented 1 year ago

I don't think it changes much for ruff whether the parser is part of the RustPython or a standalone repository. Either way, it's a dependency that we have limited control over, and contributing to it requires switching to another project. What's important to us is that it can be used standalone.

Moving the parser out of RustPython and promoting it as a standalone parser may introduce a new tension: How do you plan to balance a stable API vs. progressing the parser? Explaining this as part of a versioning policy may be important.

I'm bringing this up because we have some ideas on how to improve Ruff's parser eventually. Some of these improvements will probably benefit everyone but are breaking changes from an API standpoint:

Other changes are most likely only useful to Ruff or other static-analysis tools:

We don't have any specific plans on if, when or which of these features we want to pursue. I wanted to bring this up for you to decide how you want to position the RustPython parser and give you a heads up that ruff may, eventually, decide to roll their own parser (because the AST is a core infrastructure).

youknowone commented 1 year ago

I don't have specific plan about features, but the key idea is separating parser and other parts of RustPython.

astral-sh:main couldn't be merged into upstream because it breaks RustPython build. If the parser respository is separated, we can enhance parser first and let RustPython follows it later.

I'd like to share the new repositories' ownership with major stakeholders. I hope it can helps the concern about limited controls. Currently RustPython is upstream and other stakeholders as downstream. With the new repository, everybody will be downstream. We will have more motivation to have fine-grained dependencies and features as downstream.

I hope we just can enhance our poor ast design, but adding a customizable ast feature also could be one of the answer.

qingshi163 commented 1 year ago

for BigInt we are adapting it to malachite witch has small-int optimize internally.

MichaReiser commented 1 year ago

I hope we just can enhance our poor ast design, but adding a customizable ast feature also could be one of the answer.

I'm sorry if that's what came across. I only wanted to say that different tools may have different requirements and may, therefore, decide to model the AST differently. RustPython's AST isn't poorly designed and is a great fit for many tools.

I'd like to share the new repositories' ownership with major stakeholders. I hope it can helps the concern about limited controls. Currently RustPython is upstream and other stakeholders as downstream. With the new repository, everybody will be downstream. We will have more motivation to have fine-grained dependencies and features as downstream.

That makes sense to me and would be beneficial for Ruff :)

What do you think about CPython API compatibility? This has come up in one of my PRs and is not really a concern for Ruff.

youknowone commented 1 year ago

I didn't look them in deep yet. But the rough idea is rust-level ast doesn't necessarily are same as python-level ast. Our parser and (rust) ast can have more information then what (c)python ast has. And it actually does already a bit. RustPython can make python compatibility on python side or rust-python transition level. I hope I have more time to look in it tomorrow.

thejcannon commented 1 year ago

So I started playing around with the tree sitter rust binding and I think we might switch to that.

The interface isn't as easy to use (have to use their query language) but it supports Python 2, and keeps the door open for us to reuse the architecture for JVM languages and others

I'll keep my visitor changes around if you want so we can apply them to the new parser crate(s)

youknowone commented 1 year ago

@thejcannon Thank you so much! That will be very helpful 😄 I am planning to work on this issue soon.

youknowone commented 1 year ago

format/cformat seems reasonable to be placed out of common crate. Currently they are depending on BorrowedStr which is tightly coupled with vm::PyStr. It will require a bit of design works engineering to decouple them from rustpython-vm implementation details.

youknowone commented 1 year ago

New repository is created: https://github.com/RustPython/Parser

youknowone commented 1 year ago

I created a new team @RustPython/parser and invited @charliermarsh

@charliermarsh please feel free to add any qualified parser contributors. maybe @MichaReiser ?

youknowone commented 1 year ago

Thank you everyone. Though there are still remaining subtle issues, the core split is done!