biopython / biopython

Official git repository for Biopython (originally converted from CVS)
http://biopython.org/
Other
4.37k stars 1.75k forks source link

BSD dual licensing the C code #2374

Open peterjc opened 4 years ago

peterjc commented 4 years ago

This is a spin-out from #898, focussed specifically on the C code within Biopython.

As with the Python code, we need to establish a header comment convention for the license text (and can then automate checking which C files have it, and which do not). Following the example set in Bio/Align/_aligners.c:

/* Copyright 2018-2019 by Michiel de Hoon.  All rights reserved.
 * This file is part of the Biopython distribution and governed by your
 * choice of the "Biopython License Agreement" or the "BSD 3-Clause License".
 * Please see the LICENSE file that should have been included as part of this
 * package.
 */

and Bio/PDB/kdtrees.c:

/* This file is part of the Biopython distribution and governed by your
 * choice of the "Biopython License Agreement" or the "BSD 3-Clause License".
 * Please see the LICENSE file that should have been included as part of this
 * package.
 */

I suggest we follow the same C comment syntax, which uses the same wording and line breaks as in the Python code:

/* Copyright ...
 *
 * This file is part of the Biopython distribution and governed by your
 * choice of the "Biopython License Agreement" or the "BSD 3-Clause License".
 * Please see the LICENSE file that should have been included as part of this
 * package.
 *
...
*/
Amrithasuresh commented 4 years ago

Can I work on this? I have few questions.

1) In this file Bio/PDB/kdtrees.c there is no copyright author name(s). Should I keep it without specific author name? What about the year?

/* Copyright
 *
 * This file is part of the Biopython distribution and governed by your
 * choice of the "Biopython License Agreement" or the "BSD 3-Clause License".
 * Please see the LICENSE file that should have been included as part of this
 * package.
 *
*/
  1. Should I change the license text for the following files?
./Bio/KDTree/KDTreemodule.c
./Bio/KDTree/KDTree.c
./Bio/Cluster/cluster.c
./Bio/Cluster/clustermodule.c
./Bio/triemodule.c
./Bio/motifs/_pwm.c
./Bio/trie.c
./Bio/Align/_aligners.c
./Bio/Nexus/cnexus.c
./Bio/cpairwise2module.c
./Bio/PDB/kdtrees.c
./Bio/PDB/QCPSuperimposer/qcprotmodule.c
  1. /Bio/cpairwise2module.c. In this file there are two multiple authors. Should I follow as multiple lines like

    /* Copyright ...
    * Copyright ...
  2. /Bio/Cluster/cluster.c Should I use the same license like this

/* This file is part of the Biopython distribution and governed by your
 * choice of the "Biopython License Agreement" or the "BSD 3-Clause License".
 * Please see the LICENSE file that should have been included as part of this
 * package.
 */

Thank you

peterjc commented 4 years ago

I'm going to pass these questions to @mdehoon as author or at least contributor to much of the C code in Biopython.

mdehoon commented 4 years ago

@Amrithasuresh @peterjc I have no specific preference. I would suggest to follow what is done in the Python modules.

peterjc commented 4 years ago

Where there is no copyright line at all, I would like to see a copyright line added with the original author's name. Where someone makes substantial revisions or additions, they typically have added their name already.

With multiple authors we'd have something like this (following the Python code examples):

/* Copyright 2018-2019 by Michiel de Hoon.  All rights reserved.
 * Revisions copyright 2019 Peter Cock.  All rights reserved.
 *
 * This file is part of the Biopython distribution and governed by your
 * choice of the "Biopython License Agreement" or the "BSD 3-Clause License".
 * Please see the LICENSE file that should have been included as part of this
 * package.
 */

The hard part is tracking all the contributors to each file, and verifying that we have a public dual licensing consent from them. Over on #898 we have many of the older contributors agreements. For newer contributors, these will usually be on the relevant pull request.

To get the list of contributors to each file, we can use the git log (although file renames complicate this), but must also skim the commit comments - particularly in the early commits from CVS where commits were often made by a core developer (only a small pool of people had CVS commit rights) on behalf of a contributor. This part in particular benefits from a second set of eyes.

I have some Python scripts which I have been using for this (currently focussed on the *.py files), and some TSV files matching email address to GitHub username (where applicable) to a URL with their agreement (GitHub or mailing list links).

What I would like to see for each dual licensing commit is the agreements as links in the git commit comment, e.g. 7977d83f23e660e4d5602f15be8ae64de0a83b6c as the most recent example.

@Amrithasuresh There are relatively few C files, and relatively few people will have edited them, so perhaps try reviewing a couple of them by hand first, and see how you get on? If you think this is too time consuming, then instead I could work on my scripts and instead ask you to be the reviewer?

Amrithasuresh commented 4 years ago

Thank you for the explanation @peterjc.

There are relatively few C files, and relatively few people will have edited them, so perhaps try reviewing a couple of them by hand first, and see how you get on?

I will do this first and you can review it.

Should I provide each commit a new pull request? For eg. 1) Adding a license text for the file and 2) Dual license for the file.

peterjc commented 4 years ago

If you want to add any missing copyright lines and the old licence as a first pull request, then OK - but I would suggest you start with a pull request doing just one file as a learning experiment. How about Bio/Nexus/cnexus.c since the history is short with just a few contributors?

Amrithasuresh commented 4 years ago

@peterjc Somehow I missed to read this message.

peterjc commented 4 years ago

That's fine - the example you picked is similar.