betterscientificsoftware / bssw.io

Better Scientific Software Homepage
https://bssw.io
Other
140 stars 90 forks source link

contributor license agreement policies and alternatives #523

Closed carns closed 2 years ago

carns commented 4 years ago

I participate in some projects that use an ANL-approved CLA agreement based on the Apache Software Foundation's policies (https://www.apache.org/licenses/contributor-agreements.html). These agreements provide assurance that contributors will not assert IP claims on their contributions. This is meant to protect not just the maintainers, but also third party users and redistributors.

Conceptually that is fine, but there are two practical problems: a) these agreements are nearly impossible to get signed at some organizations and b) they greatly inhibit casual/agile contributions from users.

I would like to find a less-cumbersome alternative. One likely option is the DCO https://developercertificate.org/. Gitlab some years back switched to this model from a CLA and wrote up their analysis at https://docs.google.com/document/d/1zpjDzL7yhGBZz3_7jCjWLfRQ1Jryg1mlIVmG8y6B1_Q/edit. From a process point of view, a DCO can be acknowledged by signing off on a git commit. It does not require engaging the legal department of the contributor's home organization.

I don't know if DCOs are used elsewhere in DOE projects, or if there are other alternatives to consider. My objective is simply to find an accepted alternative that's a more practical middle ground between a CLA and nothing.

Note: this issue is separate from the choice of what software license to apply to the code itself (at least in the cases that I am familiar with). I'm perfectly happy with standard 3 clause BSD licenses for that purpose.

markcmiller86 commented 4 years ago

Conceptually that is fine, but there are two practical problems: a) these agreements are nearly impossible to get signed at some organizations and b) they greatly inhibit casual/agile contributions from users.

@carns ... can you comment on whether b) is in spite of a) or becuase of a)? I mean, once an agreement is in place, however prohibitive the process for obtaining one is, is it then still the case that these agreements greatly inhibit casual/agile contributions?

markcmiller86 commented 4 years ago

While the referenced article in the initial issue comment is too terse I think to be truly informative to our readers, maybe one of these, which describes differences between DCO and CLA, would be a better choice...

carns commented 4 years ago

Conceptually that is fine, but there are two practical problems: a) these agreements are nearly impossible to get signed at some organizations and b) they greatly inhibit casual/agile contributions from users.

@carns ... can you comment on whether b) is in spite of a) or becuase of a)? I mean, once an agreement is in place, however prohibitive the process for obtaining one is, is it then still the case that these agreements greatly inhibit casual/agile contributions?

Mostly orthogonal issues. If someone wants to submit a modest one-time feature/bug fix that was helpful for particular installation, then they could be deterred by red tape regardless of whether their organization is amenable to CLAs or not.

I can appreciate that situation myself, actually. Several years ago I wanted to submit a patch to an external code base that would have required a CLA agreement. Rather than go through the overhead, I found a friendly developer for the project who was able to fix my problem based on a verbal description (I had no interest in attribution or long-term contribution, I simply needed my pet corner case to work). I could have gotten the CLA signed with relatively little friction compared to some people, but it would still have been high overhead for the size of the contribution.

bartlettroscoe commented 4 years ago

In my opinion, this is an important problem that needs a better solution. From what I have read in the past on this topic, what most projects are currently doing would not make the lawyers very happy and is on generally shaky legal ground.

I think it would be good to cover this topic.

markcmiller86 commented 3 years ago

I think a short CC (or article) that a) introduces abbreviations and verbiage and b) compares/contrasts key concepts to help teams decide DCO vs. CLA would be useful.

bartlettroscoe commented 3 years ago

@bernhold just mentioned Developer Certificate of Origin. I like that idea. We need a way to automate that contributors sign off (e.g. buy making them click a radio button in a GitHub PR or GitLab MR in order for them to be merged).

rinkug commented 3 years ago

Important topic for software researchers. How can we proceed on this? @bernhold was investigating some aspects of this?

bartlettroscoe commented 2 years ago

NOTE: DCO is the recommended way to address this issue with the Linux Foundations CII Best Practices [dco]:

I will find some good references for how to apply DCO to git- and GitHub-based projects and write up a CC article on the topic.

bartlettroscoe commented 2 years ago

Here is how the Linux foundation suggests you apply a DCO with Git and GitHub:

and there is a standard GitHub app to enforce that every commit in a PR has been signed off:

How this is used in a real project is explained, for example, here:

What I don't really understand how using the -s option with git commit -s to insert:

Signed-off-by: Random J Developer <random@developer.example.org>

to the bottom of each commit message magically implies the developer is asserting the DCO. Just because a commit contains the "Signed-off-by" line, how does that mean they are asserting the DCO? That seems a bit of a leap.

The official git commit documentation at:

says:

-s --signoff --no-signoff Add a Signed-off-by trailer by the committer at the end of the commit log message. The meaning of a signoff depends on the project to which you’re committing. For example, it may certify that the committer has the rights to submit the work under the project’s license or agrees to some contributor representation, such as a Developer Certificate of Origin. (See http://developercertificate.org for the one used by the Linux kernel and Git projects.) Consult the documentation or leadership of the project to which you’re contributing to understand how the signoffs are used in that project.

I guess your project would have to document somewhere that a "Sign off" means that the developer is asserting the DCO to that commit. An example of that assumption/statement can be see in this project:

I suppose the fact the developer has to manually add the -s option to git commit -s (and there is no way to do this automatically that I can see yet) is supposed to mean that the developer made a conscious choice to accept the DCO. I guess you can think of the -s option as the radio button I mention above.

I guess if all of this is common practice then the fact that a commit has the standard Signed-off-by line then that should stand up in court that the developer asserted the DCO and that should be good enough for lawyers.

But this means that every single commit in a PR has to have the Signed-off-by line or you can't accept it to your project. That also means every merge commit and every other commit created by an automated tool, for example, also has to include this Signed-off-by line. But you can amend any commit to add this line by doing git commit --amend -s so the process to sign the commits is easy enough (but requires more advanced knowledge of git). (But you can't amend an older merge commit back in history with git rebase -i.) This would seem to be quite an encumbrance.

Another problem is that given the distributed nature of Git with different repos, how do you know what agreement a developer is "signing off" with respect to when they created the commit? For example, if there are two projects A and B that have very different contributor requirements (e.g. where A uses a DCO and B uses a CLA) where commits are cherry-picked between the two projects, how do you know which agreement the developer is signing off w.r.t. when they created the commit?

I will do some more research and then write up a short article explaining all of this the best I can in a compact way.

carns commented 2 years ago

Thanks @bartlettroscoe . I can also give an update on what we've ended up doing here since this issue was originally posted,

We did not adapt the DCO mechanism, but we did streamline our CLA mechanism considerably to reduce friction for collaborators. A few things that I've learned:

bartlettroscoe commented 2 years ago

@carns, thanks for the additional info. What is your project on GitHub so I can see how you implemented this CLA workflow?

I am glad to see that someone has been working on automating CLA signoffs inside of GitHub PRs. From reading https://github.com/contributor-assistant/github-action/blob/master/README.md, it suggests that every Git commit author associated with a PR would need to sign the CLA by putting I have read the CLA Document and I hereby sign the CLA into a comment in the PR (and the GitHub Action would check that before allowing the merge). That would be a bit of a hassle if one of the commits was created by someone who you would not be able to go on GitHub and do that task (because they may have disappeared). It is also not clear to me if a contributor only needs to sign once for the first PR that they have a commit in or for every PR (I would hope it is the former but the documentation is not clear on that).

So it would seem that the CLA and DCO work on completely different ends of the process, at least the way that people implement them. The DCO is signed off on right when every commit is created with the -s option with git commit -s and otherwise the PR will be rejected (and fixing it after the fact can be very painful in some cases). Alternatively, the CLA signoff occurs at the very end when the PR is actually posted against a project (and is therefore more project-specific). So would you rather hassle developers on the frontend when they are creating their commits or on the backend when they are posting PRs and trying to get their contributions accepted? I see pros and cons here.

From what I can tell, CLA and DCO are compatible mechanisms. If you are developing on software that you want to be open source, then it seems like you should get into the habit of signing all of your commits with git commit -s (and even better with git commit -s -S as part this suggestion) which you could do with an alias called something like git-commit-s to make that simpler (and with auto-complete). But the fact that commits are signed and assert a DCO w.r.t. the license in the modified files does not preclude a project from also using a CLA and enforcing it in their PRs.

Again, what concerns me about the DCO is that "this project" that is mentioned several times in https://developercertificate.org/ is ill-defined with Git commits since the same Git commit can be contributed to multiple Git repos/projects. But upon closer in inspection the standard DCO text is careful to only mention the open source license in the file(s) being modified themselves and is not really dependent on contribution requirements of some larger "project" these files may be a part of. Alternatively, the CLA that is associated with a particular PR with a particular project is more project-specific. The tool https://github.com/contributor-assistant/github-action allows you to store the CLA signatures in a separate git repo to make sure the CLA is unique to a given project and not a git repo (which is not necessarily the same thing since a repo can be truly forked to create multiple projects with different contribution requirements).

What I don't understand is why you could not just have the author of a PR assert the DCO for an entire PR when the PR is created? That would avoid having to sign every single commit and the DCO is a less scary legal declaration than what is contained in most CLAs. There is no practical difference between squashing a bunch of commits (from various authors) and DCO signing that squashed commit vs. posting a PR and then asserting the DCO for the whole set of commits (from various authors). The way the DCO is worded, that should be okay and it would avoid the upstream problem of making sure everyone signs every commit. But I see the allure of embedding DCO signoffs at the commit level for the contributions themselves since that does not tie you to GitHub, GitLab or any other git hosting service. And for a project that needs to live for decades and go through several git hosting services, that may be important.

I think I know enough now to write up something useful with some good references.

bartlettroscoe commented 2 years ago

FYI: PR #1156 posts a short article on the DCO. I decided to focus on the DCO because there is already a CC article for contributor agreements on bssw.io. But the DCO article references some articles that directly compare the DCO to CLAs and well as a lot of other info.

carns commented 2 years ago

@carns, thanks for the additional info. What is your project on GitHub so I can see how you implemented this CLA workflow?

Whoops, sorry for not answering this sooner. You can see one example here:

https://github.com/mochi-hpc/mochi-margo/blob/main/.github/workflows/mochi-cla.yml

All of the repos in that mochi-hpc organization use the same setup, and they all store results in the same dedicated repo named mochi-cla-assistant, which is private. That way if someone agrees to the CLA on any of our repos they have agreed on all (the nature of the Mochi projects is that it is intentionally split into many small repositories containing components that can be combined according to the use case).

carns commented 2 years ago

Thanks @bartlettroscoe , that's a nice DCO article.

bartlettroscoe commented 2 years ago

We automated the CLA agreement processing using the CLA assistant github action (https://github.com/contributor-assistant/github-action).

@carns, just out of curiosity, looking over the PRs in:

I don't see any that have the line I have read the CLA Document and I hereby sign the CLA. For example, the query:

comes up empty.

Can you point me to a PR that shows this signing automation?

But I do see the CLA checks being run, for example, here:

showing:

Run cla-assistant/github-action@v2.1.3-beta
CLA Assistant GitHub Action bot has started the process
All contributors have signed the CLA 📝 ✅ 
dev-throttle

So it seems that the checks that all of the contributors have already signed a CLA are there. Is it just that the automatic management of signing the CLA and storing the signed document have not be exercised in this repo? (Just checking that the contributors have signed a CLA is pretty good.)

carns commented 2 years ago

We just turned on this github action relatively recently and haven't yet received any PRs from anyone who hasn't already signed a conventional form (or are employed by ANL, and thus exempt). So for the most part it's just been confirming the allowlist so far.