astral-sh / ruff

An extremely fast Python linter and code formatter, written in Rust.
https://docs.astral.sh/ruff
MIT License
32.51k stars 1.08k forks source link

Unexpected token '>>' when linting Jupyter notebook with doctest/interpreter snippet #9189

Open mattharrison opened 10 months ago

mattharrison commented 10 months ago

I'm trying to lint notebooks that have interactive snippets in the cells.

To recreate, simple make a notebook with something like this in the cell

>>> def function(a,b):
...    return a+b
>>> function(1,2)

Ruff (v 0.1.8) fails with this error:

% ruff foo.ipynb
error: Failed to parse foo.ipynb:cell 1:1:1: Unexpected token '>>'
foo.ipynb:cell 1:1:1: E999 SyntaxError: Unexpected token '>>'
Found 1 error.
dhruvmanila commented 10 months ago

Can you expand on what does "interactive snippets" mean? Is the code snippet you've mentioned to be pasted as it is including the prompt >>> and ...?

mattharrison commented 10 months ago

Yes, it means writing code in a doctest style. Jupyter supports this style of code and I would love to be able to format it. Here's a screenshot of my notebook, it is literally a cell with the code I pasted above.

Screen Shot 2023-12-18 at 9 56 19 AM
dhruvmanila commented 10 months ago

TIL, thanks for providing the context. Can you further expand on your use-case for this? I might be misunderstanding you or missing some context but wouldn't using the interactive cells of the notebook itself be better instead? So, what I mean is the following:

Screenshot 2023-12-19 at 10 54 35
mattharrison commented 10 months ago

My use case is I write books in Jupyter and want to format the code. As to why I use interpreter snippets? So the reader can distinguish between code and output. (And so I can test the code in my book.)

On Tue, Dec 19, 2023, 9:55 AM Dhruv Manilawala @.***> wrote:

TIL, thanks for providing the context. Can you further expand on your use-case for this? I might be misunderstanding you or missing some context but wouldn't using the interactive cells of the notebook itself be better instead? So, what I mean is the following: Screenshot.2023-12-19.at.10.54.35.png (view on web) https://github.com/astral-sh/ruff/assets/67177269/b69b9464-e416-4fe9-970c-bcebb4d81c8b

— Reply to this email directly, view it on GitHub https://github.com/astral-sh/ruff/issues/9189#issuecomment-1863136231, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAA5E3K2DTEOAAATAQMPW4TYKHBIFAVCNFSM6AAAAABAZZ3QSKVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMYTQNRTGEZTMMRTGE . You are receiving this because you authored the thread.Message ID: @.***>

dhruvmanila commented 10 months ago

Thanks for providing this information! I think it can be compared to the way Python's official tutorials are written (for example https://docs.python.org/3/tutorial/controlflow.html#if-statements) where the text would be a markdown block while the code snippets would be a code block.

As for the implementation side, maybe we could utilize some parts of the new docstring formatter to get the code part from the interactive snippets (cc @BurntSushi). We could check if the first line begins with the >>> prompt and use an intermediary step to extract the code parts. I can look into it sometime next year (after my Christmas / New Year break) ;)

And, this would work for both the linter and the formatter once implemented.

SergejsKims commented 5 months ago

I have similar behavior with Databricks Notebooks. In Databricks it is normal to define %pip install ..., but in this case Ruff fails with Unexpected token '%', At least, would be good to Skip such line.

dhruvmanila commented 5 months ago

What's the difference between a Databricks Notebook and a Jupyter Notebook? Do they have different file extension? If it's .ipynb, then Ruff does support parsing those magic commands (%pip install ...).

SergejsKims commented 5 months ago

Databricks Notebook has "*.py" extension.

dhruvmanila commented 5 months ago

Ruff uses the file extension to determine the expected syntax. If it's a .py file, it'll expect a Python source code and similarly, if it's .ipynb it expects a JSON encoded Notebook source. I'm not sure why does it uses .py extension because any Python parser would fail to parse such files.