borgbackup / borg

Deduplicating archiver with compression and authenticated encryption.
https://www.borgbackup.org/
Other
10.73k stars 734 forks source link

Using a more modern KDF than PBKDF2 #747

Closed enkore closed 2 years ago

enkore commented 8 years ago

With repokey the encrypted keys are stored in the repo itself. So in almost any attack scenario the attacker has access to that (And if we store a repo in essentially untrusted storage, e.g. "the cloud", we must assume that the repokey blob is essentially 'public knowledge'). Which makes me think that PBKDF2 (with it's susceptibility to GPU/FPGA and ASIC based attacks) might not be the best choice here.

Upgrading the repokey can be done safely and transparently to the user when accessing a repo. Increasing just the iterations of PBKDF2 would not make the problem go away.

There are some recent developments into key derivation functions which are very hard to speed up with GPUs or dedicated hardware, e.g. scrypt, argon2d/argon2i. These can be tuned to derive a key in a reasonable time frame on commodity hardware (0.1 s < t < 1 s) while off-line attacks remain essentially inefficient.

I think we are in a very favourable position here, since it is not an issue if the KDF takes one second per key.

EDIT: This of course also affects the other key storage methods, repokey just being the most handy of them.

ThomasWaldmann commented 8 years ago

There was already a discussion about pbkdf2, iterations, etc. in another ticket.

One paper linked from there pointed out that increasing iterations is not best method to increase security, but one should increase passphrase length.

So, do you still think we need another kdf even if people choose long passphrases?

The problem with these other kdfs is that they often add another dependency. If we add a dependency that is not present in some linux distribution, borg can't be packaged for it (or is way more effort to package all the stuff needed). Even worse, if we add a conflicting requirement, it might even block packaging.

enkore commented 8 years ago

another ticket

I should have searched before creating this one: #77

Yes, the dependency hell is an issue, especially since soft-dependencies are IMHO not an option for borg core functionality. Vendoring might be an option, since they're small, but meh. That's not a clean solution either.

I don't think that there is "immediate need for action". There is however a point to being "truly and completely paranoid safe than sorry just a little paranoid safe". Of course a very long, high entropy pass phrase is an absolute must, independent of KDF.

Specifically re. argon there are even two bindings floating around, both of which are maintained. Huh?

In summary: maybe increase them a little, or use calibration (on borg create / KeyfileKeyBase.create), but either one is really a "nice to have". Other KDFs are more a long-term thing until they have hit common libraries and distros.

Leave this open for now and tag later?

ThomasWaldmann commented 8 years ago

Yes, tagging it "later". Maybe edit ticket title to "use more modern kdf?"?

ssd63 commented 8 years ago

Of course a very long, high entropy pass phrase is an absolute must, independent of KDF.

I object. In a perfect world, where everybody used secure passwords, and choosed a different password every time, nobody would care about the speed of a kdf, as cracking the key itself would cost as much time as cracking the password. Indeed, weak passwords (only 16 alphanumeric chars) or password reusage are the reason for all those modern kdfs.

ghost commented 7 years ago

The problem with these other kdfs is that they often add another dependency. If we add a dependency that is not present in some linux distribution, borg can't be packaged for it (or is way more effort to package all the stuff needed). Even worse, if we add a conflicting requirement, it might even block packaging.

Today we use Docker or any other things to split project to many microservices.. Anyway, you can add this support as Django, make them optional (for example, if you have skills - you can install Argon2i or Bcrypt or something like that, it will require third-party app, if not - you can use default settings)

enkore commented 7 years ago

We don't want optional or "it depends" features in core functionality like the crypto, because it means that some packaged versions of Borg can't read what Borg from another distro or just another release of a distro wrote, even if they're exactly the same version. For example to be able to restore data one would need to consider what packages are available on the distro and what the maintainer decided he would compile in etc. and that's bad.

While PBKDF2 isn't exactly state of teh art anymore there are no known flaws in it (only that there are better alternatives); and if the password has enough entropy in it then it will withstand attacks of significant scale*. A better KDF shouldn't be a reason to step down passphrase entropy.

* Borg uses PBKDF2-SHA256 so it has 256 bits of internal state you can stuff with entropy via the passphrase. A passphrase with more than 256 bits of entropy wouldn't make it harder anymore. However, attacking the AES encryption of the key directly probably becomes more economic before that... not that anyone ever managed to do that.

ghost commented 7 years ago

But it's not possible choose easy and secure together.

If I need secure it mean, I can rebuild kernel with my custom config options, I can split 1 big projects to many small microservices, limit their access rights (put this all code in different containers or KVM VPS'es), resources etc, I use disc encryption etc. If we tell about client machine, it mean https://tails.boum.org/ , which uses Tor for all things...

I can spend week of time for learning such things.

At the end solution will be secure but it will not be possible to reuse it on another OS or may be even on another CPU... And it's acceptable solution for people, who need do things secure.

I do not say about default settings (I know I lot of newbies, who come to Linux and know nothing more than apt-get install ), but as optional we should do something for people, who need ultimate security settings. Or this feature will not be usable for them and they will not use encryption and add them later.

enkore commented 7 years ago

As @ssd63 and I pointed out above a better KDF doesn't do anything for security if your passphrase is of high quality.

Using argon2 would mainly benefit people whose passphrases are not of high quality. Users with high quality passphrases receive no benefit.

ghost commented 7 years ago

Just 2 words about compatibility problems.

There is no problem to use borgbackup from Docker container with any even exotic dependencies.

-1. Just follow guide https://docs.docker.com/engine/installation/linux/ubuntulinux/ and install Docker on server and client machine (Docker for Mac or Docker for Windows, for example). Just copy-paste guide as usual.

-2. On client machine create just 1 file with name Dockerfile and contents:

FROM buildpack-deps:jessie
# It's Debian Jessie + some packages , more here https://hub.docker.com/_/buildpack-deps/

RUN set -ex \
    && echo 'deb http://ftp.debian.org/debian jessie-backports main' >> /etc/apt/sources.list \
    && apt-get update \
    && apt-get -y dist-upgrade \
    && apt-get -y autoremove \
    && apt-get autoclean \
    && apt-get -y -t jessie-backports  install borgbackup

-3. run build command docker build user/repository:borgbackup-latest . where user and repository - your user and repository on https://hub.docker.com/ (1 free private repository, in this case you can use public), this command will build image with borgbackup. You can improve RUN command and reuse command again to get updated version of image.

-4. run push command docker push user/repository:borgbackup-latest to send image to repository

-5. run borgbackup on server docker run -it --rm user/repository:borgbackup-latest borg -V and you will see borg 1.0.7.

Let's assume, borg collective will create repository with Docker image [with all exotic dependencies if needed] and push it to public repository. It will mean, to use borgbackup, end user should only install Docker on server and nothing more. He/she can directly use your image from your official repository (step 5.) or copy-paste Dockerfile and build it yourself.

So I mean, it should not be problem to install borgbackup or any other software to any server. Just install Docker and use any software with any dependencies you need...

ghost commented 7 years ago

So I suggest give choice to users: if they need easy and simple tools - they just use default settings and if will work everywhere. If they are perfectionists and prefer step by step configuring - why not?.. They understand, it may require some skills and they are ready... As perfectionist I would like to choose all settings myself and select algorithms. And with Docker even newbies can use them everywhere too (just add some lines to RUN commands)... Even if he/she use, for example, Gentoo, he/she can use Docker container with Debian and borgbackup inside with all dependencies.

enkore commented 7 years ago

Currently key files are encrypted using AES-CTR and HMAC-SHA256 in an Encrypt-and-MAC (!= -then-) scheme. That doesn't have any known weaknesses (unlike AES-CBC, which would at least theoretically make Borg vulnerable to padding oracle attacks [at the rate of one-guess-per-borg-invocation-and-failure]), but could also be done using our usual EtM scheme, which strongly adheres to up-to-date cryptographic recommendations (since it doesn't require much head-scratching to see that it works).

So for key file format v2 we want:

Note that this is wholly incompatible with older borg versions, but if we improve the handling of unknown/unparsable keys, then we could leave a v1 borg key file locally for older clients.

(I melded this and #2173 together since they're quite close together and both break compat in the same spot)

rugk commented 6 years ago

As for KDFs even the Wikipedia article mentions some alternatives…

"Encrypt-and-MAC" Really? (In the docs this not mentioned at all…)

That's not nice at all. Encrypt-then-MAC should be preferred nowadays. Quoting here I see that SSH had problems with it, the integrity of the plaintext cannot be ensured and such problems.

The most important point in favor of Encrypt-then-MAC however is this one:

The MAC does not provide any information on the plaintext

And that's the important point, whcih prevents any sidechannel attacks using the MAC in other modes…

rugk commented 6 years ago

But should not we rather open a new issue for this topic? Or maybe include it in https://github.com/borgbackup/borg/issues/1044? @ThomasWaldmann

enkore commented 6 years ago

That's documented here.

The most important point in favor of Encrypt-then-MAC however is this one:

Actually, the most important point is that modes with validated padding are susceptible to a padding oracle attack in E&M which is able to decrypt ciphertexts in a linear number of tries (~128 tries per ciphertext byte).

A MAC is not required to be a PRF, but within the standard model HMAC is a PRF if the hash is a PRF, and SHA2 is a PRF within its security margin. Therefore, no information on the plaintext is provided. (This is a bit hand-wavy since I'm preoccupied right now, but it's essentially correct)

Fxrh commented 2 years ago

Are there currently any plans to support Argon2i as KDF?

I think argon2i would increase the security of borg in the RepoKey mode significantly, as even decent-length passwords will always be the weakest link in such a setup. Further, regarding the dependency problem, LUKS2 (i.e., the linux disk encryption system, version 2 seems to exist since kernel 4.12) uses Argon2i as default for key derivation, so I'd assume argon2i should exist on any current Linux distribution by now.

ThomasWaldmann commented 2 years ago

I could imagine we add argon2 within the helium milestone. Needs some careful checks for availability, usability, etc. first though.

update: python package argon2-cffi seems to be quite widespread. considering there's an rfc now for argon2, support for it seems quite safe and will improve in future. found packages for misc. linuxes, BSDs, macOS, windows, but not for openindiana.

Also, performance needs to be evaluated. Some people use devices like the raspberry pi, others have very high speed server or desktop cpus. borg needs to support a wide range of devices except these which are problematic already due to other reasons (e.g. not enough RAM for borg's in-memory hashtables).

update: argon2-cffi support different profiles. the high-memory one (2GiB) could be problematic for some users, the low-memory one (64MiB) should work for all borg users.

ThomasWaldmann commented 2 years ago

Ideas / Plan:

Bounty: https://app.bountysource.com/issues/31864649-long-er-term-security-using-a-more-modern-kdf-than-pbkdf2

hexagonrecursion commented 2 years ago

Sounds fun! I'll give this a try.

ThomasWaldmann commented 2 years ago

Note: There's some stuff to consider: https://github.com/borgbackup/borg/issues/1579#issuecomment-245043607 when designing the new kdf / borg key generation.

hexagonrecursion commented 2 years ago

Compatibility considerations

The current versions of borg packaged and deployed today do not know how to read the new format. Even if we backport to 1.1 and 1.2 the port will take a long time to reach everyone.

Possible courses of action

User interface considerations:

ThomasWaldmann commented 2 years ago

Due to the issues (e.g. syncing, security) related to using multiple clients with the same repo, I guess many users will use 1 repo per client. For them, it will be easy: just upgrade the client, done.

For the ones sharing a repo between multiple clients, we could also require that they upgrade all clients (if we can't easily support a mixed setup). Should be possible by using our binary, even if dists have different versions depending on their age and update policy.

Currently we can only store 1 key as a repokey, so i don't think we want to have new and old keys for the same repo (and maybe we want to do the same for keyfile, even if we could have multiple files there).

New repos made with new borg should default to new KDF / new key format.

hexagonrecursion commented 2 years ago

Good

I like that security best practices are not an opt-in

hexagonrecursion commented 2 years ago

My plan is to submit this as multiple PRs. I intend to start top-down - implement the UI and documentation changes first. I will use the old key format as a placeholder until I implement the new one. I will use the Branch by Abstraction idiom by gating the new behavior behind export BORG_DEBUG_ARGON2=1

ThomasWaldmann commented 2 years ago

A) OK

B) This would immediately and automatically break mixed setups with shared repos. Key upgrade could be also done manually via borg key .... A prompt is not very helpful as a lot of stuff is scripted and there is no interactive user to prompt (so it would just hang or fail).

C) There could be a "borg key change-kdf" or "... change-type". In one of my latest PRs I am working on "borg key change-location" for repokey <-> keyfile.

D) This is not necessary if we do not automatically upgrade. Sometimes less automatic / less magic is better because then you don't need to invent counter-automatic / counter-magic measures either.

hexagonrecursion commented 2 years ago

B) This would immediately and automatically break mixed setups with shared repos. Key upgrade could be also done manually via borg key .... A prompt is not very helpful as a lot of stuff is scripted and there is no interactive user to prompt (so it would just hang or fail).

I am sorry. I have misunderstood your previous comment. Thanks for clarification

hexagonrecursion commented 2 years ago

EncryptedKey.algorithm is a confusingly ambiguous name. I did eventually find the documentation telling me that it means both the kdf algorithm and the hmac algorithm. Surprisingly there were no comments in the source. Should we rename it to message_authentication_argorithm while we are at it?

https://github.com/borgbackup/borg/blob/dfd7ea8171a947b3fe8730ac8d2c189e803985e9/src/borg/item.pyx#L302

hexagonrecursion commented 2 years ago

There are several commands that encrypt the key

ThomasWaldmann commented 2 years ago

About algorithm: I would avoid changing the attribute name, but for v2 keys, we can have better values, like argon2-aes256-ctr-hmac-sha256 (or whatever we end up with).

ThomasWaldmann commented 2 years ago

commands dealing with the key:

hexagonrecursion commented 2 years ago
More user expecience considerations: a key currently has two versions: borg.item.EncryptedKey.version == 1 and borg.item.Key.version == 1. I think we should not expose both versions separately in the user interface - just one version is enough, two would be unnecessary cognitive overhead for the end user. I Propose: User-visible version EncryptedKey.version Key.version
1 1 1
2 2 1
A hypothetical future expansion:
3 2 3
ThomasWaldmann commented 2 years ago

[Encrypted]Key.version: i don't think this is for the end user or UI, but rather for the borg code to process different kinds of keys correctly.

It's not totally clear to me right now (check if there are docs). Looks like there is some overlap, both version and algorithm could do that (let borg process such a key correctly) on their own already. Problem cases like algorithm A version 1 meaning someting different than algorithm A version 2 could be avoided by just choosing different algorithm names.

Update: There is a bit: https://borgbackup.readthedocs.io/en/stable/internals/data-structures.html#key-files

ThomasWaldmann commented 2 years ago

Key: version should stay at 1 as long as the data structure is compatible with what we have in borg < 1.3. I don't see a good reason why we should change that version number now. In my AEAD crypto PR, i just use ikm=enc_key + enc_hmac_key, so no change in the key data structure or version is needed. No "algorithm" in Key.

EncryptedKey: there we have algorithm and version and it is all about how the inner Key is processed / encrypted and how the outer EncryptedKey looks like..

An interesting question might be if a v2 EncryptedKey (or Vn+1 in general) has a superset of attributes compared to a v1 (Vn in general) key. If that is not the case, we maybe should first peek into data (see creation of EncryptedKey(internal_dict=data), extract the version from there and then, depending on the version we want, we create an object of class EncryptedKeyV1 or EncryptedKeyV2 - they could be totally different.

hexagonrecursion commented 2 years ago

I am sorry. I should communicate better.

I wanted to figure out what interface to present to the end user and how to document it.

## The user may want to create a key compatible with old borgs:
borg init --encrypted-key-version 1 ...
# By year 3000 this may grow to:
borg init --key-version 1 --encrypted-key-version 1 ...

## Even if --key-version 2 --encrypted-key-version 2 is the current default for `borg init`,
# `borg key change-version` will require the user to be explicit:
# Only affects Key.version:
borg key change-version --key-version 2 ...
# Only affects EncryptedKey.version:
borg key change-version --encrypted-key-version 2 ...

We could simplify and use one argument to control both:

--key-version Key.version EncryptedKey.version EncryptedKey.algorithm defaults to
1 1 1 sha256
2 1 2 argon2 aes256-ctr hmac-sha256
3 3 (or 2 if you prefer not to skip a version) 2 foobar42

We may want to introduce a separate switch for the algorithm at some point, but I think haing separate --key-version and --encrypted-key-version is unnecessary cognitive overhead for the end user.

rugk commented 2 years ago

IMHO (just a note from an outsider), any string (like mentioning the actual used algorithm) would be better than tossing numbers around UX. Or maybe just an --upgrade-to-latest for borg key or generally a latest parameter for key or whatever version. I guess in 99% of the use cases you want to use the latest version and best security/whatever is currently recommended security-wise.

hexagonrecursion commented 2 years ago

Here is another thought: we could keep EncryptedKey.version at 1 and dispatch based on EncryptedKey.algorithm instead. I am also considering folding argon2 type into the algorithm: 'argon2id aes256-ctr hmac-sha256' instead of 'argon2 aes256-ctr hmac-sha256' with a separate filed for type. We can then present a cleaner UI:

# --key-algorithm will eventually default to 'argon2id aes256-ctr hmac-sha256'
borg init ...
# Explicitly create a key compatible with old borgs:
# 'pbkdf2-sha256 aes256-ctr hmac-sha256' will internally
# map to 'sha256' - the magic string we use to refer to this
# algorithm in our file format
borg init --key-algorithm 'pbkdf2-sha256 aes256-ctr hmac-sha256' ...
# Upgrade the algorithm to the current best recommendation:
borg key change-algorithm ...
# Same as above (for now), but explicit
borg key change-algorithm --key-algorithm 'argon2id aes256-ctr hmac-sha256' ...
# Downgrade
borg key change-algorithm --key-algorithm 'pbkdf2-sha256 aes256-ctr hmac-sha256' ...

I think this is a cleaner interface then anything involving version numbers that imply algorithms or algorithms that imply version numbers and I think dispatch on EncryptedKey.algorithm is the most straightforward way to implement this.

ThomasWaldmann commented 2 years ago

There is no need to put the argon2 with type into the algorithm name, it would just complicate things. IF we want the type flexible and not just always use ID, we'll have that in some argon2_type attribute in the key like for all the other variable parameters of argon2.

For the EncryptedKey, try keeping the version at 1 and implement dispatch only based on algorithm name and we'll see how that goes.

ThomasWaldmann commented 2 years ago

+50 USD for the additional work caused by conflicting changes by #6463.

ThomasWaldmann commented 2 years ago

@hexagonrecursion fixed this in these changesets:

https://github.com/borgbackup/borg/pull/6468 https://github.com/borgbackup/borg/pull/6469 https://github.com/borgbackup/borg/pull/6549 https://github.com/borgbackup/borg/pull/6552 https://github.com/borgbackup/borg/pull/6556 https://github.com/borgbackup/borg/pull/6560

Thanks a lot!

ThomasWaldmann commented 2 years ago

https://github.com/borgbackup/borg/commit/56c27a99d0e1482003003bf1f43c43281c42b457#r70931262

dependency issue found by @bket.

hexagonrecursion commented 2 years ago

56c27a9#r70931262

dependency issue found by @bket.

I'll try to find time for this tomorrow