astral-sh / ruff

An extremely fast Python linter and code formatter, written in Rust.
https://docs.astral.sh/ruff
MIT License
31.29k stars 1.04k forks source link

Feature Request: Check for exact copyright header #5306

Open rbebb opened 1 year ago

rbebb commented 1 year ago

Building off of the request for flake8-copyright, it would be great to have the ability to check for an exact copyright header in any file!

edgarrmondragon commented 1 year ago

There's regex support via flake8-copyright.notice-rgx so maybe you could use that to specify the exact header you want.

(Though an exact header setting would make it trivial to implement autofix :D)

charliermarsh commented 1 year ago

Do you mind providing an example of the header you might want to enforce? Is it, like, the copyright notice content that you want to validate?

rbebb commented 1 year ago

Hey @charliermarsh, it is the copyright notice content that I'd want to validate. Here's an example!

https://github.com/cs3org/reva/blob/master/cmd/reva/app-tokens-remove.go#L1-L17

sbrugman commented 1 year ago

This feature would also be useful for popmon. We currently use Google's addlicense (it supports popular licenses out-of-the-box, e.g. apache, bsd, mit, mpl).

Since that project already uses ruff, I'd love to switch.

Example file: https://github.com/ing-bank/popmon/blob/master/popmon/analysis/functions.py#L1C1-L1C1

airwoodix commented 1 year ago

flake8-copyright is a good starting point for this, although converting the header to validate into a regular expression is a bit tedious (lots of escapes).

However, the hardcoded 1024 bytes limit gets quickly in the way. Would it be an option to make it configurable?

zanieb commented 1 year ago

Would it be useful to use the contents of the LICENSE file as the expected header?

airwoodix commented 1 year ago

@zanieb looking at the GPL guidelines, I understand that the LICENSE file would contain a complete copy of the license, while each file only starts with the much shorter license notice. The Apache-2.0 license seems to have a similar distinction.

sbrugman commented 1 year ago

Suggestion: provide a different message than "Missing copyright notice at top of file" when the contents does not match the regex/exact match. As a user, currently it's hard to debug the regex (especially multiline).

Would it be an option to split missing/present and match/no match into two codes?

Minor detail: The rules are not listing the CPY001 atm.

jeaboswell commented 1 year ago

flake8-copyright is a good starting point for this, although converting the header to validate into a regular expression is a bit tedious (lots of escapes).

However, the hardcoded 1024 bytes limit gets quickly in the way. Would it be an option to make it configurable?

I agree completely that the hardcoded limit of 1024 bytes is an issue. If you are adhering to the D100 rule, your copyright attribute gets pushed out of that limit quickly.

In my project, the only modules that don't need an exception to the rule are those that contain two to three functions.

I think ideally, if we just check if it's in the file at all, that would be a good start. If we want to get really crazy, we could try to ensure it comes after the imports, and before any code.

ThiefMaster commented 10 months ago

Shameless self-promotion but I wrote a script once that another colleague then converted into a standalone project: unbeheader - it basically manages your copyright (or whatever else) header comments in your codebase, has CI checks, and can automatically add/update them (locally, not in CI).

I'd love to see support for its functionality in ruff, but unless a codebase contains only Python maybe using ruff to enforce exact copyright headers is not ideal to begin with, since it would not handle any non-Python file types...

oliverfunk commented 10 months ago

There should be support for SPDX file descriptors, given that they are being recommended more (Hatch for example uses them by default when creating a new project)

They generally follow the format of:

# SPDX-FileCopyrightText: <year> <name> <contact>
# ^ can be multiple of these lines
#
# SPDX-License-Identifier: <licence>

The year and contact are optional, but the full spec is available.

References: https://reuse.software/spec/ https://github.com/fsfe/reuse-tool https://hatch.pypa.io/latest/config/project-templates/#licenses

I'm using the very ruff (pun not intended) regex for my config:

[tool.ruff.lint.flake8-copyright]
notice-rgx = "(?i)# SPDX-FileCopyrightText:\\s\\d{4}(-(\\d{4}|present))*"
aaronsteers commented 4 months ago

+1 to this request. And to raise an alternative more generic implementation - which would be "file_starts_with" customization.

This variant would allow auto-fixing of CPY1 indirectly, but with an approach more generic. The generic implementation would allow Ruff to check for (and auto-fix) a mandatory pre-amble for each python file.

adrinjalali commented 3 months ago

I just ended up here cause we wanted to enable this rule in https://github.com/scikit-learn/scikit-learn/, and encountered issues which took us quite a while to figure out.

For now I've put #11927 here which should fix at least a lot more people's issues.