astral-sh / ruff

An extremely fast Python linter and code formatter, written in Rust.
https://docs.astral.sh/ruff
MIT License
28.75k stars 933 forks source link

Support text Jupyter notebooks created with Jupytext #8800

Open owenlamont opened 7 months ago

owenlamont commented 7 months ago

I have a use case for Ruff and Ruff formatter that is a bit related to some of the other Markdown / Docstring feature requests but specifically I hoped to run Ruff and Ruff formatter on Jupyter notebooks that had been exported to markdown with Jupytext.

The company I'm at prefer converting notebooks to Markdown as it makes the notebook diffs much easier to read on Bitbucket (which doesn't support any notebook rendering/diffing like GitHub).

At first I noticed I could add markdown as a target file format for Ruff formatter and linter which got my hopes up that this would just work:

  - repo: https://github.com/charliermarsh/ruff-pre-commit
    rev: v0.1.6
    hooks:
      - id: ruff-format
        types_or: [python, pyi, jupyter, markdown]
      - id: ruff
        args: [--fix, --exit-non-zero-on-fix]
        types_or: [python, pyi, jupyter, markdown]

But when I ran Ruff I see it is failing to parse the markdown properly - I had hoped it would just run on the python comment code blocks in the same way it would parse Jupyter notebook cells and ignore all the other markdown content but its obviously trying to parse all the markdown, e.g.

image
dhruvmanila commented 7 months ago

Hey, we currently don't have support for linting / formatting Python code in markdown blocks (https://github.com/astral-sh/ruff/issues/8237, https://github.com/astral-sh/ruff/issues/3792). I'll close this in favor of the markdown issue for bookkeeping purposes as I think that should solve this but correct me if I'm wrong here.

dhruvmanila commented 7 months ago

Hmm, actually it would be a bit different as for markdown we wouldn't need to have the concatenated source code from all code blocks but if it's a notebook converted to markdown then I think it should have context from other code blocks? @owenlamont Do you think this is true?

Another solution currently that I can think of is to lint / format before converting it to markdown. I'm not sure how feasible this would be given my lack of knowledge about your setup.

owenlamont commented 7 months ago

Hi @dhruvmanila - yeah it would have to have the concatenated source code - I can see Ruff still tracks which code was in which Jupyter cell when raising warnings so if it could treat comment blocks exactly as Jupyter cells are treated that would be ideal.

As a work-around it could be exported to ipynb, linted and formatted, then re-exported to markdown - but that would be onerous. When working with Jupytext the notebook never gets persisted (in any permanent/visible way) as an ipynb - it gets loaded from Markdown and saved back to Markdown.

The ideal solution (from my perspective) would be to parse the YAML front matter of the Markdown, identify this as a Juptext generated Markdown, then recognise the code blocks need to be concatenated and treated as notebook cells. I totally understand though if this use case is too niche to justify the effort though. I can't speak much as to how many people use this format - as a repo jupytext is relatively popular (around 6k users - I recognise some relatively prominent Jupyter developers as contributors).

tvatter commented 5 months ago

There's a similar request for quarto notebooks (#6140), and generally for Python code included in Markdown code blocks (#3792).

davidorme commented 1 month ago

I think I'm looking at the same issue. We also use Myst Markdown notebooks in several projects and have been using the jupytext ability to pipe code through black to apply Python formatting:

$ jupytext --pipe black docs/source/users/pmodel/c3c4model.md 
[jupytext] Reading docs/source/users/pmodel/c3c4model.md in format md
[jupytext] Executing black -
reformatted -

All done! ✨ 🍰 ✨
1 file reformatted.
[jupytext] Writing docs/source/users/pmodel/c3c4model.md in format md:myst

We have that as part of a pre-commit hook to ensure that the code in our notebooks is properly formatted.

  - repo: https://github.com/mwouts/jupytext
    rev: v1.16.2
    hooks:
    - id: jupytext
      args: [--pipe, black]
      files: docs/source 
      additional_dependencies:
        - black==24.4.2 # Matches hook

I haven't been able to work out exactly what happens, but the jupytext.cli module provides a pipe_notebook function that is used to round trip something (I think it must be just the code cell contents?) through black.

davidorme commented 4 weeks ago

OK - so jupytext converts the notebook to percent format, which is a python file with the markdown content stored as comments. black can then run on the code alone and the format can be converted back.