OpenTermsArchive / contribution-tool

[prototype] Interactively declare services to be tracked in Open Terms Archive
European Union Public License 1.2
0 stars 3 forks source link

Credit contributor #135

Closed martinratinaud closed 1 year ago

martinratinaud commented 1 year ago

Fixes #134

Here is a first implementation done on the contribution tool to achieve this.

When user contributes for the first time, s.he is asked to enter h.is.er email to take credit for the change.

If s.he does, he will become a co-author on the commit

(Text in the gif is not up to date though) ota-contributor

@kissaki, I'm interested to know if you find the interface understandable enough.

Kissaki commented 1 year ago

How does it credit the authors with just the email? In Git a commit author is name + email.

Hard to see with the git immediately looping :P Looks like there is not only commit author but a co-authored-by footer too?

martinratinaud commented 1 year ago

Hi @Kissaki when the email is a GitHub email, no matter the name you pass in the Co-authored-by, it will associate your GitHub account.

Here is an example in a PR on our sandbox declaration repo

MattiSG commented 1 year ago

How does it credit the authors with just the email? In Git a commit author is name + email.

Indeed! In order to simplify the process and ask for only one piece of information, we just use a generic name. If the author has a GitHub account associated with the email, it will take over whichever name has been written anyway.

As a contributor, please let us know if you would prefer to input both your name and email rather than having this implementation 🙂

Looks like there is not only commit author but a co-authored-by footer too?

Since we use the GitHub API for creating the commit, I understand from @martinratinaud that we cannot set the commit author ourselves: the identity is simply derived from the GitHub token. Instead of asking contributors for a GitHub token, which is cumbersome and few people have, we instead use the Co-authored-by text message to credit all users, no matter if they have a GitHub account or not. In the future, we might offer the option of inputting a GitHub token (or signing in with GitHub), so that the PR opening itself is made on behalf of the user 🙂 but I wanted us to have a generic version covering most cases first.

Kissaki commented 1 year ago

Git commits have a commit author and a committer. I would expect you to be able to create commits as your bot user via the token, but declare a different commit author (name+email).

Co-authored-by is a ternary means, and allows associating multiple authors. It is not git-native, but is supported by various Git tools - including GitHub. Given that the Author field exists, it seems unnecessary to use the co-authored-by field - unless you want to explicitly declare the bot/tool the author, and the user the co-author.

GitHub accounts may be identified and displayed on GitHub, but that is presentation specific to the GitHub interface and presentation. Coming from a git perspective I would expect and want to specify my name and email. Using a pseudo name is technically wrong, and breaks as soon as you leave GitHub.

We should also consider that GitHub offers email aliases for pseudo-anonymous commits (not publishing your own email). Entering that manually is not really feasible. A GitHub auth to retreive only the name + email would probably be best for regular GitHub users.

Using a name input defaulting to Anonymous and an email input defaulting to anonymous.contributor@opentermsarchive.org seems like a good first iteration/simple baseline to me.

Something else to consider is whether we want to simply accept unverified, unauthenticated email addresses. People could potentially specify email addresses they do not own. Authing via GitHub would remedy this, but only for people who do have a GitHub account.


I do not have access to the sandbox declaration repo

MattiSG commented 1 year ago

Given that the Author field exists, it seems unnecessary to use the co-authored-by field

I agree with you, as well as about remarks on how this setup is specific to GitHub. As mentioned in my previous message, I had understood that we could not do this unless logging the user in through GitHub due to a limitation in the Octokit API.

However, looking at the octokit.rest.repos.createOrUpdateFileContents documentation, it seems that one can pass a author.name and author.email. @martinratinaud is there thus any reason why we don't use those? 🙂

unless you want to explicitly declare the bot/tool the author, and the user the co-author.

I believe it is relevant for the tool to be listed, since authors might have intended something different than what the tool created on their behalf. In that case, however, setting the committer information to the bot would seem the most relevant to me.

We should also consider that GitHub offers email aliases for pseudo-anonymous commits (not publishing your own email). Entering that manually is not really feasible.

GitHub users can find their pseudonymous email address in their account settings (“emails” section), or derive them from their user id in the form ID+USERNAME@users.noreply.github.com. However, this is a bit cumbersome to find.

We could use the GitHub API to retrieve the noreply email from the user ID, potentially through authentication, but I don't believe the cost-to-benefit ratio of that solution is good.

Something else to consider is whether we want to simply accept unverified, unauthenticated email addresses.

At this stage, I believe this risk is acceptable. The impersonation risk is not significantly higher than through any other Git commit means. An attacker could forge a commit crediting an impersonated author name and email and open a pull request for it. The only difference is that here, we cannot identify the GitHub user that opened the pull request. However:

  1. We have means to contact the credited author in case of doubts emerging from their authorship (attack scenario mitigated: make an offensive contribution, attribute it to target).
  2. If the tool is marked as committer, authors have plausible deniability of their actions (attack scenario mitigated: track documents of a non-cooperative service provider, attribute them to employee of service provider).

Authing via GitHub would remedy this, but only for people who do have a GitHub account.

I would prefer that we refrain from adding GitHub authentication at this stage, for two reasons:

  1. I understand that setting up OAuth would require us to operate a dedicated database, while the contribution tool does not currently store any data.
  2. The vendor-lock-in to GitHub is already too strong. Let's avoid adding more.

I do not have access to the sandbox declaration repo

Unless other @OpenTermsArchive/core devs believe it would be more useful to open it, I believe it is important that this sandbox remains private to @OpenTermsArchive/core. This repository has been set up precisely to offer an opportunity to try out unstable features and track documents without risking being confused for production data. The current level of discoverability of production repositories is low, and sandbox and contrib could be very difficult to distinguish, for example.


Based on this conversation, @martinratinaud what would you think of the following:

? 🙂

martinratinaud commented 1 year ago

Thanks a lot @Kissaki and @MattiSG for this suggestions, which I all agree with. I will implement what's described.

martinratinaud commented 1 year ago

Here is how the new implementation looks like ota-contributor-2

If you are fine with it @MattiSG please review the copy and I can merge it tomorrow

martinratinaud commented 1 year ago

Thanks a lot @Kissaki for your help and insights. The feature is live on the contribution website and we've been wanting for a long time :-).