elasticdog / transcrypt

transparently encrypt files within a git repository
MIT License
1.46k stars 102 forks source link

Working towards Transcrypt version 3 with support for PBKDF2 key-derivation #162

Open jmurty opened 1 year ago

jmurty commented 1 year ago

This is (yet another) pull request focussed on updating transcrypt to support the much more secure PBKDF2 key-derivation algorithm.

Although this is the latest of many attempts, I think the progress in this PR is very promising and has a good chance of leading to the long-awaited version 3 of transcrypt with stronger modern encryption.

For anyone keen for this to happen, I would appreciate help in sense-checking the conceptual approach for the new PBKDF2 key-derivation feature – especially for "Deterministic salting" – as well as testing of the implementation so far. Thanks for your assistance.

I also need to thank Jon Crall (@Erotemic) in particular for pushing for these improvements, doing the hard work of discussing, considering, and documenting problems and potential solutions, and for building a working version in https://github.com/elasticdog/transcrypt/pull/136 that is heavily incorporated into the code here. This would not be happening at all without these efforts.


WARNING

This code branch is still young, is in flux and likely to change, and it is doesn't yet have adequate test coverage or documentation. It should be ready for testing, but do NOT use it for production or on repositories you care about.

Get started

Get the transcrypt script from this PR and:

Existing transcrypt-encrypted repositories should continue to work, and the command format should be backwards compatible so you can initialise an existing repo with e.g. ./transcrypt -c aes-256-cbc -p 'correct horse battery staple'

To enable PBKDF2 key-derivation you must set four new configuration settings: kdf, digest, project salt, and iterations.

In interactive mode – or after running ./transcrypt --rekey to change encryption settings in an existing repo – opt in by answering yes to the new "Use a key derivation function for best security?" option then set these values when prompted.

In command mode, use the new arguments like so: --kdf pbkdf2 --digest sha512 --iter 256_000 --salt 5J0QY8uOTe7/B9eYSJ2kOy91

The example _sensitivefile in this repository has been updated to use PBKDF2, initialise it with: ./transcrypt -c aes-256-cbc -md sha512 -k pbkdf2 -n 256_000 -ps 5J0QY8uOTe7/B9eYSJ2kOy91 -p 'correct horse battery staple'

Goals

Key goals for this work:

Deterministic salting for PBKDF2 (Feedback needed)

For transcrypt to work:

The decryption -> encryption process on an unchanged file must be deterministic for everything to work transparently. To do that, the same salt must be used each time we encrypt the same file.

Up until now, the deterministic per-file salt has been the last 16 hex bytes of a HMAC-SHA256 of the file's name, content hash, and the raw password.

Per discussion started by @andersjel in a comment on issue #55 keeping this approach when using PBKDF2 key-derivation would lead to a weakness where an attacker could brute-force guess the password using relatively cheap HMAC-SHA256 operations, bypassing the advantages of the more expensive PBKDF2 algorithm.

The current approach in this PR is to replace the raw password used in the HMAC-SHA256 operation, with an alternative "project salt" value which is generated at init time and stored in the Git config transcrypt.project-salt. The project salt is a randomly generated base64-encoded 18 byte value by default, or is provided by the user.

The project salt value must be provided by the user to decrypt a previously-initialised repository when using PBKDF2, or it may be stored and available in a new in-repo configuration storage mechanism (see here).

TODO Provide more details and documentation about the planned approach

Project salt approach abandoned July 2023: derive project salt from password

The current approach in this PR is to replace the raw password used in the HMAC-SHA256 operation, with an alternative "Project salt" value which is generated at init time and stored in the Git config transcrypt.project-salt. The project salt is generated by encrypting the raw password with itself and using the exact same PBKDF2 settings as all other files in the repository.

The idea – which is hopefully correct? – is that deriving the project salt from the password in this way does not introduce a weakness because an attacker would need to invest just as much brute forcing the partial salt value used for an encrypted file as they would brute forcing any other file in the repository.

The advantage of this approach is that the project salt can be derived the same settings already needed to initialise transcrypt for PBKDF2, without the need for extra configuration options.

An alternative approach discussed in PR #136 and documented here is to use a random (non-derived) value for the extra salt, with an additional configuration option or in-repo storage needed to apply it when initialising subsequent encrypted repositories.

Related pull requests

These pull requests are likely superseded by this one, but they remain a valuable resource full of discussion and context for this work:

Related issues

These issues would be fixed by this PR. Again, they contain useful information about the history and context of this work:

Erotemic commented 1 year ago

I will try to look at this in more detail later, but given a quick read I'm skeptical that the determenistic salt method is valid. The method for generating the salt is determenistic, and even if it takes time to compute it once, it still only needs to be computed once per password. If an attacker has a rainbow table of hashed passwords, they can modify that rainbow table with a single pass (albiet an expensive single pass) to determine the salt that corresponds to each common password.

I think we want to be very careful to avoid doing anything that resembles rolling our own encryption and we want to stick to standard practice as strictly as possbile. I don't think generating a random salt is avoidable, but I'm not a professional cryptographer so I'm open to hearing others' thoughts.

jmurty commented 1 year ago

Hi @Erotemic you are completely right, my attempt to avoid the need for an additional "project salt" configuration setting makes attacks easier than they should be.

I have reverted this misguided approach and gone back to a variation of the approach from your PR: project salt is an additional value, randomly-generated by default, which is stored in the Git config and must be provided by the user (or in-repo storage) along with the password to decrypt a repository using PBKDF2.