astral-sh / ruff

An extremely fast Python linter and code formatter, written in Rust.
https://docs.astral.sh/ruff
MIT License
32.49k stars 1.08k forks source link

Check `CPY` rules in the first cell (markdown, code, or raw cell type) #12802

Open adamjstewart opened 2 months ago

adamjstewart commented 2 months ago

Keywords

flake8-copyright, CPY, Jupyter Notebook, .ipynb

Steps to reproduce

Using the following pyproject.toml:

[tool.ruff]
extend-include = ["*.ipynb"]

[tool.ruff.lint]
extend-select = ["CPY"]
preview = true

[tool.ruff.lint.flake8-copyright]
author = "..."

if I run ruff check on my repo, all files pass except the Jupyter Notebook files. This is because ruff only checks the first 4096 bytes of the file, whereas the copyright may be in the first cell of the notebook farther below.

Version

Ruff 0.5.7

Solution

We would likely need to check more bytes, or allow the total number of bytes to be set on a per-repo or per-file basis. For now, the only workaround is to disable CPY on Jupyter Notebooks.

MichaReiser commented 2 months ago

Hmm, the notebook code should run on the concatenated contend of all python cells. Could you share a notebook with us that demonstrates the issue?

adamjstewart commented 2 months ago

Try this one: https://github.com/microsoft/torchgeo/blob/v0.5.2/docs/tutorials/getting_started.ipynb

You'll need:

[tool.ruff.lint.flake8-copyright]
author = "Microsoft Corporation"
notice-rgx = "Copyright \\(c\\)"
charliermarsh commented 2 months ago

Thanks, I'll try to reproduce.

charliermarsh commented 2 months ago

Hmm, I ran this locally and got:

getting_started.ipynb:cell 4:1:1: CPY001 Missing copyright notice at top of file

My pyproject.toml looks like:

[tool.ruff]
extend-include = ["*.ipynb"]

[tool.ruff.lint]
extend-select = ["CPY"]
preview = true

[tool.ruff.lint.flake8-copyright]
author = "Microsoft Corporation"
notice-rgx = "Copyright \\(c\\)"
dhruvmanila commented 2 months ago

@adamjstewart Is this working on your end? Could we mark this issue as resolved?

adamjstewart commented 2 months ago

This is not working, I see CPY001 even though the notebook contains a copyright.

MichaReiser commented 2 months ago

I think the problem here is that Ruff only looks at python cells, but the license is in a markdown cell.

And it seems that @charliermarsh is able to successfully reproduce. The expected result is no diagnostics

MichaReiser commented 2 months ago

To get this to work, I think we would have to special case the rule for notebooks or users have to move the license into a python cell.

adamjstewart commented 2 months ago

Ah, this makes sense. Not sure how hard special casing the rule would be, but I expect it's more common to find copyrights in markdown cells than in code cells. After all, it isn't code, it's a message to the user.

dhruvmanila commented 2 months ago

Oh, I see. I misunderstood Charlie's message, sorry for that.

Yeah, we might have to special case this rule with some additional changes in the Notebook struct to access the markdown cells. It would also need to be validated in an editor context because we only ask for Python cells from the client:

https://github.com/astral-sh/ruff/blob/5d85fb75eaf03f62c10472e7362ad2f1c8250e7a/crates/ruff_server/src/server.rs#L335-L344