astral-sh / ruff

An extremely fast Python linter and code formatter, written in Rust.
https://docs.astral.sh/ruff
MIT License
32.71k stars 1.09k forks source link

Expose Ruff's public API as a Python library #659

Open charliermarsh opened 2 years ago

charliermarsh commented 2 years ago

See: #593

facundobatista commented 1 year ago

This would mean that we could do something like the following, right?

import ruff
errors = ruff.check_files(list_of_paths)
...

Thanks!

charliermarsh commented 1 year ago

Yup, that's right!

charliermarsh commented 1 year ago

@messense - I wasn't certain on this last time -- if we bundle a Python API with Ruff, will we need to build separate wheels for every Python version?

messense commented 1 year ago

If you can use abi3 features, one wheel per platform, otherwise you need to build separate wheels for every Python version.

charliermarsh commented 1 year ago

Awesome thank you. I think we should be able to do that, so maybe this will be really straightforward.

provinzkraut commented 1 year ago

Hello there, do you happen to have a rough timeline for when (if?) this is going to happen? I'm looking to integrate ruff into a tool I'm developing, which would require an API of some sort. It would be very helpful to know if this is something I can wait on, or look for another solution / workaround!

charliermarsh commented 1 year ago

@provinzkraut - It's definitely going to happen! I could probably ship it within the next week or so. I'd just been punting on it until I had more people asking for it.

Could I hear a bit more about your use-case, if you don't mind sharing?

provinzkraut commented 1 year ago

@charliermarsh That's good to hear!

Could I hear a bit more about your use-case, if you don't mind sharing?

Sure. I'm working on a markdown extension to automatically generate pymdown tabs for different Python versions from a source version, i.e. generate 3.7, 3.8, 3.10 tabs from a 3.7 source (repo).

Currently I'm using pyupgrade to generate the versions and autoflake to clean imports that have become superfluous. Especially autoflake is quite slow, making up a majority of the extensions runtime. Since ruff is way faster at this, I'd like to use it (also one less dependency). I fiddled around with using the CLI version, but that's messy and a performance degradation.

charliermarsh commented 1 year ago

@provinzkraut - Ok, cool. Let me see what I can do. I don't know if you're comfortable reading Rust, but would the current Rust public API suit your use-case, were it callable from Python with Python objects etc.?

charliermarsh commented 1 year ago

In short: it takes a file path (to find the pyproject.toml), the raw Python source code, and an autofix setting, and returns a list of checks (which themselves include the raw fixes / patches).

I'm guessing that for your use-case, what you actually want is a function that takes source code (plus settings, to enable a list of checks) and returns fixed source code?

provinzkraut commented 1 year ago

but would the current Rust public API suit your use-case, were it callable from Python with Python objects etc.?

I looked at this yesterday because I though that maybe it could be as simple as adding a tiny wrapper around the rust lib myself, but it seems to be a bit more involved. The current API doesn't really lend itself that well to my usecase.

I'm guessing that for your use-case, what you actually want is a function that takes source code (plus settings, to enable a list of checks) and returns fixed source code?

That would be ideal, yes. Dealing with a list of checks and extracting what I need from it also wouldn't be that big of an issue, but passing in configuration directly and omitting the config file is crucial, both for the needed configurability (I need to run the fixers with varying configuration for every invocation) and performance (I'm running the fixers many times on small snippets, which means the overhead of looking for and parsing a pyproject.toml every time adds up).

squiddy commented 1 year ago

I'm working on this now.

phillipuniverse commented 1 year ago

I had a need to execute Ruff as an Alembic post write hook. I came up with a very hamfisted approach that I found from the distributed __main__.py

image

alembic.ini:

[post_write_hooks]
# post_write_hooks defines scripts or Python functions that are run
# on newly generated revision scripts.  See the documentation for further
# detail and examples

# format using "black" - use the console_scripts runner, against the "black" entrypoint
hooks = ruff, black

ruff.type = ruff

black.type = console_scripts
black.entrypoint = black

and then Alembic's env.py:

import os
import sysconfig
from alembic.script import write_hooks

@write_hooks.register("ruff")
def run_ruff(filename, options):
    ruff = os.path.join(sysconfig.get_path("scripts"), "ruff")
    os.spawnv(os.P_WAIT, ruff, [ruff, filename, "--fix", "--exit-zero"])
charliermarsh commented 1 year ago

👍 Yup that should be safe to do! (The downside being that you have to go through the CLI rather than calling a function directly. Hoping to enable that soon but not working on it right now.)

asmeurer commented 1 year ago

Is the plan here to make a Python library that links to ruff directly? I want something that I can use in an interactive Python REPL to check for errors as the user types stuff, and shelling out to a subprocess on each character typed doesn't sound like a good idea (especially if I also have to write out the code to a tempfile or heredoc).

If you're curious, here's what I'm currently using with pyflakes https://github.com/asmeurer/mypython/blob/a836d0956a6443f7a85a032dc625ff3da1479a91/mypython/processors.py#L196. The code is complicated in part because pyflakes doesn't handle syntax errors very well, so I have to parse them separately. I haven't checked if ruff handles them better. There's lot of opportunities to improve over pyflakes' barebones Python API.

The main thing I would want from a Ruff Python API is a function that takes a string of Python code and returns a list of errors with line number, start and stop column numbers (where relevant), and the error message. Being able to get corresponding fixes would be nice too, I guess. The best API I can think of for a "fix" would be to return the whole block of code with the specific warning fixed, along with a new line and column number corresponding to the line and column of the original warning (so that I can interactively keep the cursor in the "same" location).

I'm happy to discuss API ideas more in depth or test out any prototypes if you're interested.

MichaReiser commented 1 year ago

That sounds cool!

If you're curious, here's what I'm currently using with pyflakes asmeurer/mypython@a836d09/mypython/processors.py#L196. The code is complicated in part because pyflakes doesn't handle syntax errors very well, so I have to parse them separately. I haven't checked if ruff handles them better. There's lot of opportunities to improve over pyflakes' barebones Python API.

Ruff creates a diagnostic for files with syntax errors. Adopting a more error-resilient parser is something that we consider doing.

The main thing I would want from a Ruff Python API is a function that takes a string of Python code and returns a list of errors with line number, start and stop column numbers (where relevant), and the error message.

That sounds reasonable, but we aren't there yet (your best shot is to call into the CLI). One of the biggest problems of exposing a linter API right now is that Ruff writes one-off warnings to stdout and relies on the global state to track whether to write the warning. Cleaning this up probably requires a larger refactoring around the diagnostic system... so that may take a while.

charliermarsh commented 1 year ago

For what it's worth, we power ruff-lsp and the VS Code extension over subprocess, and the CLI actually supports enough behavior to power the operations needed there. For example, you can use --format json to get a structured list of violations and their fixes. Similarly, if you pass input via stdin, and run with --fix, we print the "fixed" output to stdout.

provinzkraut commented 1 year ago

For what it's worth, we power ruff-lsp and the VS Code extension over subprocess, and the CLI actually supports enough behavior to power the operations needed there

How do you feel about adding a Python module that wraps this up in a convenient API?

I've been using the solution you suggested in a few of my tools now, and not having to implement that boilerplate every time would certainly be nice.

If that sounds good to you, I'd be happy to contribute.

Zac-HD commented 1 year ago

I'm keen to replace autoflake+isort with ruff in my shed all-in-one autoformatter - the subprocess trick works pretty well, except that if there's any way to change the isort settings in ruff without a config file I can't see it - and running isolated from any config is pretty important in this use-case. Any suggestions, or do I just need to wait for the library interface in this issue?

pawamoy commented 8 months ago

Adding a data point: in mkdocstrings-python we format function signatures with Black if it is installed. We would like to support Ruff to, but spawning a subprocess for each signature is very costly, so we would greatly appreciate a Python binding that doesn't use subprocesses :slightly_smiling_face: A wrapper that hides the subprocess calls sounds nice, but won't be enough for our use-case.

MichaReiser commented 8 months ago

@pawamoy that sounds neat. We plan to integrate our LSP into ruff (implemented in Rust). I know, it's not as convenient as a Python API but it would allow you to format files without spawning a process for every signature (although it might still be very costly because it requires multiple LSP calls to format a single code snipped)

pawamoy commented 8 months ago

By calls do you mean network calls? Or could we somehow spawn the LSP server locally (like a daemon)?

MichaReiser commented 8 months ago

You would spawn the LSP like a daemon and communicate over stdin/stdout.

pawamoy commented 8 months ago

Ah, interesting. Then yeah, that's already much better than subprocesses :slightly_smiling_face: Thanks for the info!

amyreese commented 8 months ago

Adding a data point: in mkdocstrings-python we format function signatures with Black if it is installed. We would like to support Ruff to, but spawning a subprocess for each signature is very costly, so we would greatly appreciate a Python binding that doesn't use subprocesses :slightly_smiling_face: A wrapper that hides the subprocess calls sounds nice, but won't be enough for our use-case.

I put together an experimental package that uses PyO3 to wrap the Ruff formatter in a Python API that doesn't require any subprocesses. I'd still consider it alpha at best (there's only one callable function), but maybe it could be helpful to others as well?

https://github.com/amyreese/ruff-api

pawamoy commented 8 months ago

Amazing, thanks for sharing! I'll check it out :)

Zac-HD commented 8 months ago

@charliermarsh just checking in - is there any way to configure the isort settings in --isolated mode, or do I just have to wait? No worries if so, I'm just looking forward to replacing black too...

AlexWaygood commented 8 months ago

@charliermarsh just checking in - is there any way to configure the isort settings in --isolated mode, or do I just have to wait? No worries if so, I'm just looking forward to replacing black too...

@Zac-HD, yes, there is! We recently extended the --config flag so that arbitrary configuration options can be overridden via the command line using "inline TOML": https://docs.astral.sh/ruff/configuration/#the-config-cli-flag. So to override the isort extra-standard-library setting in --isolated mode (for example), you'd do something like ruff check path/to/file.py --config "lint.isort.extra-standard-library = ['path']".

mbelak-dtml commented 8 months ago

Adding another data point: In edvart, we are currently using isort to sort imports in Python code which is being dynamically. With a Python API, we could fully switch to ruff. For now, we are using ruff to format the source code, but keeping isort to format the generated code.

jankatins commented 5 months ago

Another data point: it would make it easier to replace programmatic calls to black, like in mdsformat-black: https://github.com/hukkin/mdformat-black/blob/master/mdformat_black/__init__.py


def format_python(unformatted: str, _info_str: str) -> str:
    return black.format_str(unformatted, mode=black.Mode())
adamchainz commented 4 months ago

I’d want to use an API like black.format_str over in blacken-docs, where Ruff support is tracked in this issue: https://github.com/adamchainz/blacken-docs/issues/352

MichaReiser commented 3 months ago

Not the most elegant solution and I haven't tried it myself, but it should soon be possible to call the ruff WASM API from Python:

Considering that we have a WASM API now, I'm open to reconsidering a PyO3 API. Let me discuss this internally.

Note: The API would not fall under any semver guarantees. We expect a major breaking change once we introduce multifile analysis. Practically, the API hasn't changed in months.

MichaReiser commented 3 months ago

I would be open to expose a Ruff Pyo3 API:

I'm happy to support if anyone's interested in contributing the API to ruff.

maxschulz-COL commented 3 months ago

Hey, just to add another data-point: we at Vizro would also love to be able to invoke ruff from within python without subprocess. Something like black.format_str indeed :)

n8henrie commented 2 months ago

Same here -- would love to replace black.format_str in automatically formatting jupyter cells with https://github.com/n8henrie/jupyter-black/ !

amyreese commented 2 months ago

@maxschulz-COL @n8henrie @adamchainz I would like to remind folks that https://github.com/amyreese/ruff-api has a working, simple API wrapping both the formatter and import sorter from Ruff, just a pip install ruff-api away. We have been using it to successfully migrate from Black in our monorepo while still maintaining our existing integrations/tooling written in Python. :)

analog-cbarber commented 2 months ago

That looks great, but it is also documented as "highly experimental", so people maybe reluctant to add that to their tool chains. Why don't you contribute that to the ruff project?

n8henrie commented 2 months ago

Agreed -- it would be great to have this under ruff's umbrella!