go-gitea / gitea

Git with a cup of tea! Painless self-hosted all-in-one software development service, including Git hosting, code review, team collaboration, package registry and CI/CD
https://gitea.com
MIT License
43.87k stars 5.38k forks source link

Adopt REUSE best practices to clarify copyright and licensing #16132

Open mxmehl opened 3 years ago

mxmehl commented 3 years ago

Description

I suggest that Gitea becomes REUSE compliant. That would mean that every file in this repository is unambiguously marked with its copyright and licensing information. It would be a good timing to do so since the FSFE (heavy Gitea user itself) is currently offering projects some help in adopting these best practices.

Gitea is distributed under MIT and a very popular project. So it is likely that developers may reuse whole files or parts of them in their own projects. That's completely fine, but it's beneficial if information about their license and copyright is retained in this step.

Especially if some files are under a different license (e.g. Creative Commons) it's important to be aware of this to avoid a license violation. On a quick search I found that some files under vendor/ are licensed under Apache-2.0, e.g. from mongodb and opentelemetry. The whole vendor section might be the most difficult one to make REUSE compliant as it contains 3rd-party code for which editing license and copyright information should be avoided.

That is why external help is probably a good idea :)

jolheiser commented 3 years ago

All direct Gitea source files are marked with a license header.

Unfortunately, the vendor directory can't really be altered because afaik go mod vendor and such are responsible for keeping those files in sync. That is to say, whenever a vendor is updated it would be overwritten.
As well, I'm not sure it's good practice to add content (even just comments about licensing) to vendored files.

That is why external help is probably a good idea :)

If someone wishes to comment on this issue, I'd be interested in seeing what changes would need to be made (if any) and if it's viable given Go's tooling.

mxmehl commented 3 years ago

For the copyright lines, the example file you've given is perfectly fine.

However, for the license, it's best practice to use SPDX license identifiers with the corresponding tag. So for this file, you could replace

// Use of this source code is governed by a MIT-style
// license that can be found in the LICENSE file.

with a mere SPDX-License-Identifier: MIT. That makes it easy to understand by humans and machines alike, and also enables an easy search for files that are acking such information. Otherwise, you'd have to trust heuristics.

Unfortunately, the vendor directory can't really be altered because afaik go mod vendor and such are responsible for keeping those files in sync. That is to say, whenever a vendor is updated it would be overwritten. As well, I'm not sure it's good practice to add content (even just comments about licensing) to vendored files.

I understand. I'm not so familiar with Go and its vendoring.

What you could do is doing a bulk-declaration of the individual directories without touching the files, using DEP-5.

mxmehl commented 2 years ago

With the vendor directory gone, making Gitea compatible with the REUSE best practice and therefore provide clear open source licensing became much easier.

I'd be happy to help with this, but this only works if a Gitea maintainer would provide some info and review the added copyright and licensing info. @techknowlogick AFAIU you currently work on the NGI granted federation project and perhaps also heard about REUSE once in this process. Would you like to be a sparring partner here as well? Or someone else?

techknowlogick commented 2 years ago

@mxmehl I am indeed working on the NGI grant, would you be open to coming to our (as of yet unscheduled) April community video call and presenting on REUSE? We usually cap them at an hour in length and so you could use as much or as little of that time as you'd need. I'd be very interested in hearing more about REUSE, and seeing if/how it could be implemented in gitea. With "software supplychain" and "software bill of materials" becoming hot-topics as of late, I think it would be especially relevant to our project.

mxmehl commented 2 years ago

@techknowlogick Sorry for the late reply! Somehow this missed my inbox and I only found it by accident.

April went by, but I would be happy to join a futute call and present REUSE there. Thanks for providing the opportunity!

mxmehl commented 1 year ago

Thanks, that's a large step forward, but I'm afraid this is not completed yet. There are > 50% of files not fixed yet:

# SUMMARY

* Bad licenses:
* Deprecated licenses:
* Licenses without file extension:
* Missing licenses: MIT
* Unused licenses:
* Used licenses: MIT
* Read errors: 0
* Files with copyright information: 2460 / 4971
* Files with license information: 1871 / 4971

Unfortunately, your project is not compliant with version 3.0 of the REUSE Specification :-(

Again, the REUSE team would be happy to assist with this (/cc @lnceballosz)

lunny commented 1 year ago

We also merged a check from ourself check tool gitea-vet in #22004 . How did you generate that report?

techknowlogick commented 1 year ago

@lunny I'm guessing probably the https://reuse.readthedocs.io/en/stable/readme.html CLI tool. The linked PR only touches only the golang files, but the JS files will need to be updated too (or as @mxmehl says above, add a dep5 file so you don't need to edit every single file)

mxmehl commented 1 year ago

Exactly. I can recommend to give the CLI tool a try. Using the reuse addheader command and some of its flags, you can bulk-annotate many files at once, e.g.:

reuse addheader -c "The Gitea Authors" -l MIT --recursive --skip-unrecognised .

The command above may mess with some existing copyright statements. The newly introduced flag --no-replace may be useful, or you could switch altogether to the proper SPDX-FileCopyrightText tag instead, or you run separate search/replace operations to unify them.

I'd personally only use dep5 for case in which you have a large number of files in a directory that you cannot comment, e.g. binary test data, icons, or JSON files, because you don't want to add separate {filename}.license files for each of them. This way, you make sure the copyright and licensing information stays with the source code file and not a rather remote metadata file.