astral-sh / ruff

An extremely fast Python linter and code formatter, written in Rust.
https://docs.astral.sh/ruff
MIT License
31.09k stars 1.03k forks source link

Check `CPY` rules in the first cell (markdown, code, or raw cell type) #12802

Open adamjstewart opened 1 month ago

adamjstewart commented 1 month ago

Keywords

flake8-copyright, CPY, Jupyter Notebook, .ipynb

Steps to reproduce

Using the following pyproject.toml:

[tool.ruff]
extend-include = ["*.ipynb"]

[tool.ruff.lint]
extend-select = ["CPY"]
preview = true

[tool.ruff.lint.flake8-copyright]
author = "..."

if I run ruff check on my repo, all files pass except the Jupyter Notebook files. This is because ruff only checks the first 4096 bytes of the file, whereas the copyright may be in the first cell of the notebook farther below.

Version

Ruff 0.5.7

Solution

We would likely need to check more bytes, or allow the total number of bytes to be set on a per-repo or per-file basis. For now, the only workaround is to disable CPY on Jupyter Notebooks.

MichaReiser commented 1 month ago

Hmm, the notebook code should run on the concatenated contend of all python cells. Could you share a notebook with us that demonstrates the issue?

adamjstewart commented 1 month ago

Try this one: https://github.com/microsoft/torchgeo/blob/v0.5.2/docs/tutorials/getting_started.ipynb

You'll need:

[tool.ruff.lint.flake8-copyright]
author = "Microsoft Corporation"
notice-rgx = "Copyright \\(c\\)"
charliermarsh commented 1 month ago

Thanks, I'll try to reproduce.

charliermarsh commented 1 month ago

Hmm, I ran this locally and got:

getting_started.ipynb:cell 4:1:1: CPY001 Missing copyright notice at top of file

My pyproject.toml looks like:

[tool.ruff]
extend-include = ["*.ipynb"]

[tool.ruff.lint]
extend-select = ["CPY"]
preview = true

[tool.ruff.lint.flake8-copyright]
author = "Microsoft Corporation"
notice-rgx = "Copyright \\(c\\)"
dhruvmanila commented 1 month ago

@adamjstewart Is this working on your end? Could we mark this issue as resolved?

adamjstewart commented 1 month ago

This is not working, I see CPY001 even though the notebook contains a copyright.

MichaReiser commented 1 month ago

I think the problem here is that Ruff only looks at python cells, but the license is in a markdown cell.

And it seems that @charliermarsh is able to successfully reproduce. The expected result is no diagnostics

MichaReiser commented 1 month ago

To get this to work, I think we would have to special case the rule for notebooks or users have to move the license into a python cell.

adamjstewart commented 1 month ago

Ah, this makes sense. Not sure how hard special casing the rule would be, but I expect it's more common to find copyrights in markdown cells than in code cells. After all, it isn't code, it's a message to the user.

dhruvmanila commented 1 month ago

Oh, I see. I misunderstood Charlie's message, sorry for that.

Yeah, we might have to special case this rule with some additional changes in the Notebook struct to access the markdown cells. It would also need to be validated in an editor context because we only ask for Python cells from the client:

https://github.com/astral-sh/ruff/blob/5d85fb75eaf03f62c10472e7362ad2f1c8250e7a/crates/ruff_server/src/server.rs#L335-L344