GenerousLabs / brainstorming-encrypted-git

Brainstorming how to build encrypted git remotes on top of isomorphic-git
GNU Affero General Public License v3.0
0 stars 0 forks source link

Brainstorming privacy designs #2

Open chmac opened 3 years ago

chmac commented 3 years ago

Following #1, what privacy tradeoffs make sense?

git-remote-gcrypt obscures everything by encrypting the packfiles and replacing the remote's single commit with a new one on each push. This is very far towards the privacy end of the spectrum, but introduces trade offs.

For mobile first applications that use git to store data, what tradeoffs would make sense?

chmac commented 3 years ago

As a mobile user...

I want to be able to push small changes to a large repository without pushing all the files again.

I want to be able to fetch a smaller subset of files without having to sync the whole repository.

I am willing to invest some trust in my git host. Their access logs will expose metadata about when I read and wrote data anyway. They will also, most likely, know some real contact information for me (name, email, etc) and also have access to an approximation of my location via my IP address.

I would like the contents of my files, their names, and my commit messages to be encrypted and invisible to my git host. I might tolerate my git commit times and git email address being exposed to my git host.

chmac commented 3 years ago

Idea: Could we encrypt each commit?

What would that look like? A commit is a tree of objects. If we encrypt the objects themselves, we'll need to get into the guts of git, that won't work for standard git hosting services.

We could consider something like git-crypt, where files are decrypted when copied into the working directory, and encrypted when changes are committed, etc. We could potentially add encryption to the commit messages, but not the other parts of the commit.

chmac commented 3 years ago

Idea: Copy the plaintext repo to an "encrypted" repo.

What does that mean? Similarly to git-remote-gcrypt, we could encrypt files into a second git repository, and copy commits from the "clean" repository.

Two hard problems in computer science, cache invalidation and naming things - Phil Karlton

What happens if I pull a whole repo and then push it to an encrypted remote? We need to walk the whole tree, the entire history, and create an encrypted copy.

How would we get commits back from the encrypted remote? We could build a decrypted version of the content, but not necessarily the git history. Hmm, yeah, maybe this approach doesn't work... 🤔

chmac commented 3 years ago

Idea: What about encrypting all objects and creating our own object store?

Unclear how / if that would work with GitHub, et al. Could we push random objects and get them back later? The git garbage collection cleans up unused objects. Maybe it's possible to workaround that somehow. 🤔

What about creating synthetic "trees" for the encrypted remote?

So the "tree" in the encrypted repo, if checked out, would look like:

We could potentially shard this also. So adding a new object to the "clean" git repo would result in a new object being created in the "encrypted" repo.

What would the workflow look like?

We could run the above skipping any objects which already exist, given that we only need each object to exist once.

chmac commented 3 years ago

Why encrypt every object instead of just encrypting the packfiles like git-remote-gcrypt does?

The packfiles change. They're more space efficient. When they change the encrypted version changes. That means we need to always pull everything from the encrypted remote before pushing. While if we encrypt all the objects individually we can safely push knowing that we're only ever adding objects to the encrypted repo. It's an append-only object-store where all our objects get stored as encrypted blobs.