elasticdog / transcrypt

transparently encrypt files within a git repository
MIT License
1.43k stars 102 forks source link

Configurable Security Improvements & OpenSSL 3.x Support & Python API #132

Open Erotemic opened 2 years ago

Erotemic commented 2 years ago

Closes #133 #55

Supersedes #126

Introduction

This PR seeks to increase the security of transcrypt while also maintaining backwards compatibility (two opposing goals that made this tricky).

Currently transcrypt is using some weak encryption settings that contain several vulnerabilities with design goals that make solving them challenging:

To address these issues I've added the following config / CLI variables

Additionally, I've added support for these in the display, gpg import/export. Configuration salt gets re-randomized on a rekey. There is a --config-salt method the user can also use to specify their salt manually.

In addition to these security improvements I've added support for OpenSSL V3.x, which requires transcrypt to manually prepend the salt encoding before we convert to base64 and save. Decryption still works normally. I simply introduced a wrapper that checks the version.

Lastly, to test these new features I implemented a Python API around transcrypt. There are still a few things that could be improved, but the basic functionality is there. I use this along with the PythonGit, gpg_lite, and ubelt Python packages to implement new tests of these features. We could use this to publish a package in pypi that will allow users to install trancrypt via pip. Personally, I like the idea of pip install transcrypt and having it install a python package as well as the primary bash script itself.

New Features

This PR adds:

Details

@elasticdog @jmurty

I have a proof of concept that reworks the transcrypt config in a backwards compatible way. This should be considered a work in progress, but it is nearly complete and feedback at this stage would be helpful.

I've also written a small Python wrapper API to that allows me to test a wide range of the above configurations to make sure they all work correctly.

The main difference from a default user's perspective is that they will now be asked to configure more settings when they run transcrypt as is. All of the new options can be specified on the command line so running:

transcrypt -c 'aes-256-cbc' -p 'correct horse battery staple' -md 'md5' --use-pbkdf2 '0' -sm 'password' -cs '' -y

would decrypt the repo (assuming the correct password) without any interaction.

I made a lot of changes to the main file in order to support this. I tried to consolidate duplicated code that needed the entire parameter configuration. Previously this wasn't so bad because it was only 2 variables. Now there are 6 and I'm not 100% set on them as they are, so I wanted to write it in such a way that it wouldn't be too hard to change the details of the scheme I'm using.

I've prefixed the new helper functions I've written with an underscore (e.g. _get_user_input) to help make writing to help make get_cipher and corresponding other functions easier. The new code for those functions looks like:

# ensure we have a cipher to encrypt with
get_cipher() {
    local prompt
    prompt=$(printf 'Encrypt using which cipher? [%s] ' "$DEFAULT_CIPHER")
    _get_user_input cipher "$DEFAULT_CIPHER" "validate_cipher" "$prompt"
}

I'm not sure if there is any security issue passing around variables by name in bash like this. My bash is pretty good, but I'm a bit shy of an expert.

There was also a place that was using "current_" instead of just "" and it wrote custom code to load the entire configuration. I think just using <varname> everywhere and putting all of the code to load the state into one function (i.e. _load_transcrypt_config_vars) that can be called wherever it is needed should also work and be easier to work with. But I wanted to flag this in case I missed something and "current_password" and "current_cipher" had those local variable names for a reason.

Note that there is also a bash helper file that is also checked in. This is just a scratchpad I was using to ensure my bash was correct. I plan to remove it once I finish this.

I still need to write more tests, clean everything up, and ensure my tab/space usage is consistent with the existing script, but any general comments on structure or obvious issues would be helpful.

Discussion

On the salt method, I'm still somewhat unhappy with this method, but I can't think of anything better. I think we need to have some partial configuration file that ships with the repo.

jmurty commented 2 years ago

Hi @Erotemic thanks for starting work on this, I'll aim to help out when I can though I don't tend to have a lot of time.

One suggestion: could you update this PRs description with a brief summary of the specific problem(s) this PR will solve? I'm familiar with your goals from prior discussions across tickets, PRs, email etc but I think it would be helpful to give some context up front, so others can follow along and contribute.

From my perspective, this PR is working towards enabling PBKDF2 for stronger encryption similar to #126 but with a different salting mechanism to avoid potential weaknesses discussed in #55 where the strength (and value) of using PBKDF2 could be bypassed, plus some refactoring and cleanup.

Erotemic commented 2 years ago

@jmurty Thanks for taking a look.

Yes, I'll make a high level motivation write-up. I can make single issues that this PR will close.

On the technical side, my recent pushes add support for OpenSSL 3.x (which was a huge pain, they removed the salted prefix) while still being backwards compatible with 3.x.

I've refactored the bash_helpers.sh file into transcript_library.sh that contains most of the new helper functions I've written with unit tests and more documentation. Stripped versions of these functions are also in the transcrypt executable itself. It would be nice to have a set of library calls to keep the size of the main transcrypt logic smaller, but having more than one file in a bash project seems like a packaging nightmare.

I'm not sure how to run the .bats style tests, so I've written all new tests in Python. I hope adding that as a test time dependency is ok. It's easy enough to get a Python env on any CI, so I hope it's not a problem. The Python tests let me define a grid:

    basis = {
        'cipher': ['aes-256-cbc', 'aes-128-ecb'],
        'password': ['correct horse battery staple'],
        'digest': ['md5', 'sha256'],
        'use_pbkdf2': ['0', '1'],
        'salt_method': ['password', 'configured'],
        'config_salt': ['', 'mylittlecustomsalt'],
    }

And then I can execute each test (currently encryption round trip, export/import gpg key, and rekey - which covers most functionality), over each of these 32 configuration settings.

Also, while writing the Python tests, I basically ended up writing Python bindings for the transcrypt API. It might make sense to distribute this as a Python package. It also wouldn't be hard for me to get that setup without breaking compatibility if you'd like.

Erotemic commented 2 years ago

I think this PR is ready. Can @elasticdog or @jmurty approve this for dashboards?

jmurty commented 2 years ago

Hi, there's quite a lot here to digest, I'll start to work my way through it.

One thing I'm concerned about is rushing to an implementation, when we might benefit from stepping back and thinking through the algorithm and techniques in a more abstract way, to arrive at the best future-proof approach given Transcrypt's requirements and desirable characteristics (strong encryption with fast performance).

I'll open another ticket to raise this question and ask for feedback.

For example, I'm wondering if it might make sense to decouple password key derivation from the per-file encryption step, such that we pre-generate the actual key and IV from the password once – at init time – using a strong and slow technique, then provide the key (-K) and IV (-iv) to the OpenSSL enc command instead of the original password with the pbkdf2 flag? Or is there a benefit to re-deriving the key from the password using relatively slower pbkdf2 every time a file is encrypted?

Erotemic commented 2 years ago

I think a step back makes sense, having an abstracted description of the algorithm will be critical for determining the correctness of the implementation. I'll see if I can pull something together.

I think it's also worth getting feedback on the public API I've chosen. I don't think --use-pbkdf2=1 --salt-method=configured --custom-salt='' are the best CLI names in the world. This first pass was just to get somI ething clear, usable, and it should be written in a way that it won't be hard to modify names or behaviors of the flags.

I think your idea about generating the key and iv securely once and then getting faster encrypt / decrypt times is a good one. There isn't much of a downside that I can see. Security against brute force should be the same. If they have your local git config you've already lost anyway, so it would only improve speed (unless I am missing something). But the downside is that would make this implementation more complex than it is, it wouldn't fundamentally change the functionality implemented here, and I've been using pbkdf2 on my fork for over a year and I've never had issues with slowness. Perhaps it is critical for some workflows, but my thought is encrypted files aren't changing that often, so the added complexity isn't worth it here (I do need to finish up on this and move onto other tasks). But I do think that feature should be on the roadmap. It might be a good idea to brainstorm about what that roadmap though.

The other thing to note is that transcrypt will break on everyone who recently upgraded to Ubuntu 22.04 LTS. So the sooner a patch that fixes that issue goes out the better. If necessary it shouldn't be hard to port that change to a separate PR, but I'd rather not do that myself. What I will do is fix the tests and try to boil the logic down to a simplified document.

jmurty commented 2 years ago

The breaking OpenSSL change was a nasty surprise to me, thanks for finding and fixing that. I'll backport your fix to make it releasable sooner.

And if you could write up an issue to discuss the new PBKDF2 + salt approach as a spec, that would be great.

I'm out of my depth with the intricacies of crypto algorithms and risks / trade-offs, so the more eyes and feedback we can get the better.

Erotemic commented 2 years ago

@elasticdog @jmurty

I've put together a draft of an algorithmic writeup and checked it into this PR under docs/algorithm.rst Please take a look here: https://github.com/Erotemic/transcrypt/blob/enhance-configuration/docs/algorithm.rst

Also, feel free to directly commit any edits to this document. I did try to focus on the algorithms, but my thinking always tends to drift towards implementation, so you may notice me doing that here and there (I always feel compelled to make the names in the implementation and docs line up).

Now that I've written this draft my thought is that we may want to morph it into some sort of "transcrypt enhancement proposal" that finalizes the details of the open question I've outlined, from which I or someone else can write the implementation that agrees with it.

I will note that I've spent quite a bit of my vacation on this, and while I do truly enjoy this sort of work, I will only have a limited amount of time to dedicate to this (among the other FOSS projects I work on) moving forward, so any help with finalizing details on:

would be helpful as I could remove the open questions, write the candidate implementation, and ensure my fork will likely match whatever will (hopefully) be accepted into the main repo would be greatly helpful to me in my day-to-day.