devlooped / SponsorLink

SponsorLink: an attempt at OSS sustainability
https://www.devlooped.com/SponsorLink
MIT License
32 stars 3 forks source link

RFC: A Proposal to Fix Major Privacy Risk #48

Closed KabirAcharya closed 1 year ago

KabirAcharya commented 1 year ago

Issue

SponsorLink currently operates by sending email hashes to a server to track sponsorship status. This is not a great idea for multiple reasons. Some of these reasons include:

Proposal

I propose a simple solution which avoids sending any data about users/developers to any server. The final choice is of course yours, but I welcome community input to suggest the "better" option and to suggest any beneficial changes, hence RFC. My solution involves 3 logical components, however you may choose to unify them in terms of code-base/server. It does not need to take the same namespace as your own devlooped/sponsors, but I have just used that as an example for now.

Client

The client is responsible for validating that the current GitHub user is a sponsor of the project. It should be completely OSS and can be distributed via a package manager.

Process:

  1. Retrieve a user's GitHub username from:
    • ssh -T git@github.com
    • git config credential.username
    • Otherwise request user to authorize with Git/SSH to enable SponsorLink support.
  2. Retrieve repository author's username from:
    • git ls-remote --get-url origin
  3. Retrieve repository author's ID from:
  4. Check if the username is in https://github.com/devlooped/sponsors/tree/<author-id>/sponsors.md
  5. Carry out post-check SponsorLink logic.

NOTE: I highly recommend not implementing any build slowdown. It goes against the entire premise of OSS sponsorships to punish developers for not sponsoring a project.

Setup Bot/Server

This bot is responsible for setting up the initial environment for an author to use SponsorLink and for the update bot to keep track of them.

Process:

  1. Any author who wishes to add SponsorLink to their repository should connect their GitHub account to the bot, rather than the current arrangement of requiring every developer to connect their GitHub account.
  2. The author's GitHub ID can be collected from the API, only requiring the read:user scope to first get their username.
  3. A branch on https://github.com/devlooped/sponsors/tree/<author-id> is created with an empty sponsors.md (or an extra pass of the update logic for initial population).

Update Bot/Server

This bot is responsible for updating sponsors.md for each branch of a "sponsors" repository.

Process:

  1. List all of the numerical branches in devlooped/sponsors.
  2. Query the username using the API:
  3. Choose:
    • If viable, use the custom logic for devlooped-bot to list sponsor usernames for each branch and update their respective sponsors.md files (with a plaintext/decodeable list).
    • Use this query (or similar) to the GitHub GraphQL API to update sponsors.md files (you will need to have a cursor to handle accounts with more than 100 sponsors).
      query {
      organization(login: "devlooped") {
      hasSponsorsListing
      sponsors(first: 100) {
      edges {
      node {
        ... on User {
          login
        }
        ... on Organization {
          login
        }
      }
      }
      }
      }
      user(login: "devlooped") {
      hasSponsorsListing
      sponsors(first: 100) {
      edges {
      node {
        ... on User {
          login
        }
        ... on Organization {
          login
        }
      }
      }
      }
      }
      }

I hope you take these suggestions into consideration and am looking forward to all feedback from anyone!

wrexbe commented 1 year ago

It could just be simpler, and use environment variables

IAmSponsoring="library1,library2,library3" or to opt out IAmNotGoingToSponsor="library3,library4"

If someone isn't going to sponsor, then it doesn't matter how much you bug them about it. This will at least make people think about it. You could even make it a license condition to set one or the other, and to not lie.

kzu commented 1 year ago

@KabirAcharya thanks a lot for the detailed write-up.

Please take a look at the current most-likely approach discussed in another issue.

While I think using a github repo and branches for the sponsors list/validation, that may not only not scale sufficiently, but also suffer from the fact that not all sponsorships are public. You don't want to leak sponsors lists from sponsors who requested to remain anonymous.

At that point, if you make the entire repo private, you're just trying to use GitHub as a database or blob storage, and I'd rather just use an Azure storage account for that instead.

The mentioned issue explores what a k-anonimity scheme would look like for the client-side verification.

I think skipping the verification should also be done very explicitly via something like gh sponsors disable, where we get a chance to explain the importance of sponsoring to make your supply chain sustainable in the long run and what-not. The internal mechanism that uses to turn off the analyzer should be just an internal implementation detail.

Note that by making the sponsorlink tool a GH CLI extension, we can assume/require the user to be properly authenticated, so we don't need ssh or other commands to set things up on their side. See https://github.com/devlooped/gh-sponsors for the initial work on that.

If the direction seems reasonable to you, you might consider closing this issue?

kzu commented 1 year ago

@wrexbe I like the simplicity! I think such a "license condition" would make it a non-OSI approved license at that point, no?

wrexbe commented 1 year ago

Idk, but how important even is that.

kzu commented 1 year ago

Custom license have to through legal, most likely.

wrexbe commented 1 year ago

I've seen products that are free, but would do something like requiring you to show "Powered by Moq" somewhere, maybe you could do something similar if someone is not a sponsor.

kzu commented 1 year ago

I'm closing this for now since we've moved to signed local manifest and are no longer storing any user/sponsor information and are adding the ability to remove all user traces on demand too.