QubesOS / qubes-issues

The Qubes OS Project issue tracker
https://www.qubes-os.org/doc/issue-tracking/
539 stars 48 forks source link

Create a script that automatically add/update license header in all files available in repositories #7320

Open fepitre opened 2 years ago

fepitre commented 2 years ago

We lack license field in a huge amount of files. We need to have a script that would add and update the license entry yearly based on git authors.

iacore commented 2 years ago

I don't think this is useful. Just having LICENSE in the git repo is enough. You don't even need to provide the whole license text; a URL to a webpage of the license full text is enough to be legally binding (at least if the copyright offender and you both live in the US). By default (without any license), no one can use the project. However, Github's terms says that by being hosted publicly on Github, anyone can fork and use it freely for personal use.

I hate projects with license in every single source file, and even taking up huge space in minified Javascript files for deployment. If someone like AWS wants to steal a open-source project, they will do it anyway.

If we want to give contributors fame, it's better to put that on the Qubes OS website instead in source code.

marmarek commented 2 years ago

The very LICENSE file you refer to mandates those comments:

            How to Apply These Terms to Your New Programs

  If you develop a new program, and you want it to be of the greatest
possible use to the public, the best way to achieve this is to make it
free software which everyone can redistribute and change under these terms.

  To do so, attach the following notices to the program.  It is safest
to attach them to the start of each source file to most effectively
convey the exclusion of warranty; and each file should have at least
the "copyright" line and a pointer to where the full notice is found.

More on that here: https://www.gnu.org/licenses/gpl-faq.html#LicenseCopyOnly

fepitre commented 2 years ago

I hate projects with license in every single source file, and even taking up huge space in minified Javascript files for deployment. If someone like AWS wants to steal a open-source project, they will do it anyway.

We care about legal stuff and the issue is about updating those mandatory fields and notably copyright entries.

ProfessorManhattan commented 2 years ago

Hey @fepitre @marmarek @locriacyber @andrewdavidwong -- excited to be a first time contributor doing something that I personally think matters a lot. At first I thought this would be a 30 minute Node.js script but there's actually quite a bit that will be involved with a proper implementation. Here's the rundown:

  1. We can use an automated tool to automatically insert the notice in each file (at first - more on why we cannot completely automate the process below). I created a configuration for a Rust CLI tool that seemed to fit here: https://github.com/notken12/licensesnip/pull/2/commits/fa1956de4c9c13052e749a6f4746406c6bd0b3b8 -- I realize the project only has a couple stars.. there are a couple NPM modules we can piece together that will accomplish the same thing as well.
  2. Second, what do we put in the header of each file? With my 4 hour dive into source code license stamping standards, I determined that SPDX is the way to go - it's an ISO standard and used by the Linux kernel project as well. It will make Qubes OS source code easy to read for both humans and machines.
  3. How do we format that header? Basically, according to this blog article - we should format it like this:
    
    SPDX-FileCopyrightText: © {$year_of_file_creation} {$name_of_copyright_holder} <{$contact}>

SPDX-License-Identifier: {$SPDX_license_name}


4. The article points out that its source is [Reuse](https://reuse.software/). Reuse's standard is utilized by https://github.com/torvalds/linux. Basically, you include the above formatted comment in all the files and additionally create a `LICENSES/` folder in the root of each repository that contains all the different licenses that are utilized the project. All of our code should be GPL-v2-only but since we're going this far, we might as well design the system to accomodate for things like fonts, images, copied code that should be identified with its proper license. Reuse is available via pipx and all the Linux package managers.
5. In 2019, Reuse was trying to get GitHub to adopt its standard but I assume that lost traction. In order to make the GitHub add the license badge to search results, we need to include the Qubes OS license in the traditional `LICENSE` file in the root of the repository. This means we would have two identical license files - one at `LICENSE` and one at `LICENSES/GPL-v2-only.txt`. The Linux kernel source code repository does not include this file - perhaps for a good reason because in the case of multi-license repositories, GitHub will convey to the user that the repository is using a specific license. We might want to take a hybrid approach and include the `LICENSES/` folder if there is more than one license used by the files in the repository and use the `LICENSE` file if there is only one. I'll leave this up to you guys --- I can implement however you want it.
6. At first, we can do a bulk SPDX copyright injection but after that, each file will have to have the comment properly added to the top of the file since this system is meant to account for the cases where multiple types of licenses are used across different files. However, we can incorporate Reuse's linter as a pre-commit hook or a GitHub CI step.

Finally, I think the following notice (or something roughly equivalent) should be added to the bottom of each `README.md` file for all the repositories we want to cover with this system:

----------

## License

The original source code contained within this repository is licensed under [GPL-2.0-only](__PLACEHOLDER_REPOSITORY_HTTPS_URL__/blob/master/LICENSE).

In some cases, Qubes OS projects include other works that use a different license. All of the licenses that this particular project is licensed under are located in the `LICENSES/` folder. If multiple licenses are present in the `LICENSES/` folder, you can determine the license used by a particular file by inspecting each file's [SPDX identifier](https://spdx.dev/ids/) located at the top of the source code.

Copyright &copy; 2012 - 2022 The Qubes OS Project and others
ProfessorManhattan commented 2 years ago

Please provide some feedback and I can create a POC on a repo with various file types.

Also, the Qubes OS license reference on the website just points to documentation on how to write a license. We should fill out the template, hire a legal professional on Upwork to skim over it, and run it by whoever is legally responsible for Qubes. We can then use that license for all the repositories that share the same GPL license. There's also the case of a couple repositories being marked as LGPL -- I'm not sure what to make of that at this point.

I also don't think it makes sense to include contributors in each file - I think citing "The Qubes OS Project" as the author will be taken more seriously

andrewdavidwong commented 2 years ago

Is this related to (or perhaps even a duplicate of) #6500?

CC @adrelanos

fepitre commented 2 years ago

This is related to. This is issue is really about creating an implementation that would do things automatically like for kernel or pulseaudio updates.

adrelanos commented 2 years ago

I don't think this is useful. Just having LICENSE in the git repo is enough. You don't even need to provide the whole license text; a URL to a webpage of the license full text is enough to be legally binding

Me in https://github.com/QubesOS/qubes-issues/issues/6500

  • Each source file needs a copyright header. I've asked FSF about that ~9 years ago by private e-mail and they said yes. Nowadays FSFE is recommending the same.

(at least if the copyright offender and you both live in the US).

Consider non-US too. And Qubes imprint was non-US last time I checked.

By default (without any license), no one can use the project.

Maybe but it fires both way. If licensing isn't critical clear, then this discourages forking due to doubts that licensing is crystal clear. Should Qubes ever be deprecated (Let's hope not!) (I am not aware of any indication that this is happening.), then this might prevent a successful fork due to licensing being less than perfect.

iacore commented 2 years ago

@ProfessorManhattan did your tool get copyright year from git blame?

NSfsfe commented 2 years ago

Hello,

By way of introduction, I am Niharika Singhal from the Free Software Foundation Europe, and the REUSE Booster program. I am delighted to contribute to your project by way of replying to some your proposed ideas. 1) The REUSE program has an automated tool to automatically insert the copyright notice in each file. This is called the REUSE helper tool which adds the headers, so the tool automatically deletes the existing headers (if any) and change it to the SPDX format.

2) You have to put the copyright and licensing information as the header in each file. The reuse addheader command helps you with formatting the header and adding the license information. More information on this is provided here: https://reuse.software/tutorial/

3) I have prepared a pull request for the repository - Qubes-Manager from this project, which will give an illustrative view of how the files could look after being REUSE compliant [Ref: https://github.com/NSfsfe/qubes-manager.git]. I think that would be helpful for you to get a better assessment.

If you want to discuss further REUSE specifications, feel free to reach out to me.

Good luck, Niharika Singhal Free Software Foundation Europe Schönhauser Allee 6/7, 10119 Berlin, Germany | t +49-30-27595290 Registered at Amtsgericht Hamburg, VR 17030 | https://fsfe.org/support

ProfessorManhattan commented 2 years ago

Hey

@ProfessorManhattan did your tool get copyright year from git blame?

No, according to the Reuse spec, all we need to add is the copyright year. The git blame is a good idea, perhaps, but I think it adds some complexity that would require pre-commit hooks or a CI step that commits to the branch.

Hey @NSfsfe -- thanks for the example. I think this is it, with an example implementation from the organization that puts out the spec that the Linux kernel uses. My only recommendation would be to keep the notice out of the lint configuration files since that might cause some hiccups. Also, I think it would be better to use "The Qubes OS Project and others" as the copyright holder - that's what is cited as the owner on the website. If there are any issues with the copyright and one of the developers ghosts, it will be clear that all of the files are under the Qubes umbrella.

andrewdavidwong commented 2 years ago

Also, I think it would be better to use "The Qubes OS Project and others" as the copyright holder - that's what is cited as the owner on the website.

The website and code repos are managed very differently. I chose that copyright notice for the website based on the nature of the content, how it all works, and common practices for websites; but I'm not a programmer and barely interact with the OS source code repos. So, we can't assume that the same holds of the code repos just because they're both part of the same project. While I wouldn't be surprised if the proposed language ended up also being appropriate for the code given its generality, I have no firm grounds for believing it does or does not. My point is merely to caution against the assumption that we can copy/paste between two radically different contexts and to make sure that the proposed language makes sense on its own merits in the code context.

ProfessorManhattan commented 2 years ago

Fair enough, I also thought it could potentially get sticky if the copyright notices were ever actually used for something. It would boil down to (Qubes vs Something) or (Dev1 vs Something and Dev2 vs Something and Dev3 vs Something) --- the point is that by assigning the notice to each individual developer, the power of the copyright is diluted. It could also get weird if one developer gives permission and another does not.

On Sun, Jun 19, 2022 at 2:31 AM Andrew David Wong @.***> wrote:

Also, I think it would be better to use "The Qubes OS Project and others" as the copyright holder - that's what is cited as the owner on the website.

The website and code repos are managed very differently. I chose that copyright notice for the website based on the nature of the content, how it all works, and common practices for websites; but I'm not a programmer and barely interact with the OS source code repos. So, we can't assume that the same holds of the code repos just because they're both part of the same project. While I wouldn't be surprised if the proposed language ended up also being appropriate for the code given its generality, I have no firm grounds for believing it does or does not. My point is merely to caution against the assumption that we can copy/paste between two radically different contexts and to make sure that the proposed language makes sense on its own merits in the code context.

— Reply to this email directly, view it on GitHub https://github.com/QubesOS/qubes-issues/issues/7320#issuecomment-1159627589, or unsubscribe https://github.com/notifications/unsubscribe-auth/AOJRHXI5PSLSITZD37L4EMLVP25DHANCNFSM5P56ENUA . You are receiving this because you were mentioned.Message ID: @.***>

-- Regards,

Brian Zalewski (412) 345-3338

marmarek commented 1 year ago

3. I have prepared a pull request for the repository - Qubes-Manager from this project, which will give an illustrative view of how the files could look after being REUSE compliant [Ref: https://github.com/NSfsfe/qubes-manager.git]. I think that would be helpful for you to get a better assessment.

Thank you for the showcase PR @NSfsfe ! (for context, it's here: https://github.com/QubesOS/qubes-manager/pull/322)

Generally I think it all looks good, but I have few questions:

  1. How SPDX-FileCopyrightText should be used when file has multiple authors? For example https://github.com/QubesOS/qubes-manager/blob/main/qubesmanager/qube_manager.py
  2. Many of our files (especially python code) already have copyright header - including license info and copyright holders. Can the tool use that info when adding SPDX headers?
  3. Is there a recommended practice to keep copyright info in sync (especially the copyright holder part)?
adrelanos commented 1 year ago

To add a few questions:

  1. Can the tool also update debian/copyright (example) file, which is currently incomplete?
  2. Can the tool also create a COPYING file in the root of the source code folder?
marmarek commented 1 year ago

Can the tool also create a COPYING file in the root of the source code folder?

This one is already covered here: https://reuse.software/faq/#single-license

NSfsfe commented 1 year ago

Hi @marmarek !

Thanks for all the questions.

  1. You can mention the name of an author and add et. al.. For example, SPDX-FileCopyrightText: 2022 John Doe, johndoe@gmail.com et al in a file. It is an excellent practice to then include a directory at the root of the project, titled as AUTHORS which shall include all the names of the authors.

  2. The REUSE Tool automatically detects the copyright and license info and would also reflect the same in the results. However, you could also change that information by way of an automated command. To use the information from the existing python files, after the detection of the copyright and license information, you could manually check the header and then give an automated command to copy the same to other files. However, please note that even though the license and copyright information is detected, the syntax for a particular file may be incorrect, so you must use the REUSE automated add header tool to correct those existing python files as well for an overall compliance.

  3. Yes, as mentioned in Q.1, it is strongly recommended to include a directory at the root of the project including all the names of the authors. This way you could also modify the names in a single file in a folder rather than amending each file in different folders.

NSfsfe commented 1 year ago

Hi @adrelanos !

Regarding this question: Can the tool also update debian/copyright (example) file, which is currently incomplete?

No and this will never be implemented. Also, dep5 support will be deprecated and replaced by reuse.yaml soon.