apache / incubator-pegasus

Apache Pegasus - A horizontally scalable, strongly consistent and high-performance key-value store
https://pegasus.apache.org/
Apache License 2.0
1.97k stars 313 forks source link

Data at rest encryption #1575

Closed acelyc111 closed 7 months ago

acelyc111 commented 1 year ago

Motivation

There are some Pegasus users that store privacy data in Pegasus, it’s important to protect the data against unauthorized access by persons who gain access to the storage media used by Pegasus.

It's possible to support transparent data at rest encryption to provide a way to protect users’ data, which is transparent to users and straightforward to set up for operators.

Data at rest encryption refers to encrypting data for storage and decrypting it when reading the stored data. It uses symmetric encryption where the same key is used to encrypt and to decrypt the data. Keys need to be stored and handled securely as anyone with access to a key will be able to decrypt any data encrypted with it.

Cloud disk encryption

If your Pegasus clusters are deployed on public cloud service storages, it’s possible to use their own encryption solutions. See:

It’s not needed to enable Pegasus Data at rest encryption to avoid encrypting/decrypting data twice, which may lead to poor performance.

Goals

Non-Goals

Cryptography overview

Symmetric-key algorithm

Symmetric-key algorithms are algorithms for cryptography that use the same cryptographic keys for both the encryption of plaintext and the decryption of cipher-text. The keys may be identical, or there may be a simple transformation to go between the two keys. The keys, in practice, represent a shared secret between two or more parties that can be used to maintain a private information link. The requirement that both parties have access to the secret key is one of the main drawbacks of symmetric-key encryption, in comparison to public-key encryption (also known as asymmetric-key encryption).

AES

Advanced Encryption Standard, is a block cipher with a block size of 128 bits, but three different key lengths: 128, 192 and 256 bits. AES supersedes the Data Encryption Standard (DES), the algorithm described by AES is a symmetric-key algorithm, meaning the same key is used for both encrypting and decrypting the data.

Block cipher

A block cipher is a deterministic algorithm that operates on fixed-length groups of bits, called blocks. Block ciphers are the elementary building blocks of many cryptographic protocols. They are ubiquitous in the storage and exchange of data, where such data is secured and authenticated via encryption.

A block cipher uses blocks as an unvarying transformation. Even a secure block cipher is suitable for the encryption of only a single block of data at a time, using a fixed key. A multitude of modes of operation have been designed to allow their repeated use in a secure way to achieve the security goals of confidentiality and authenticity. However, block ciphers may also feature as building blocks in other cryptographic protocols, such as universal hash functions and pseudorandom number generators.

ROT13

ROT13 ("rotate by 13 places") is a simple letter substitution cipher that replaces a letter with the 13th letter after it in the latin alphabet.

Because there are 26 letters (2×13) in the basic Latin alphabet, ROT13 is its own inverse; that is, to undo ROT13, the same algorithm is applied, so the same action can be used for encoding and decoding. The algorithm provides virtually no cryptographic security, and is often cited as a canonical example of weak encryption.

facebook/rocksdb uses ROT13 as an encryption sample.

block cipher mode of operation

In cryptography, a block cipher mode of operation is an algorithm that uses a block cipher to provide information security such as confidentiality or authenticity. A block cipher by itself is only suitable for the secure cryptographic transformation (encryption or decryption) of one fixed-length group of bits called a block. A mode of operation describes how to repeatedly apply a cipher's single-block operation to securely transform amounts of data larger than a block.

Most modes require a unique binary sequence, often called an initialization vector (IV), for each encryption operation. The IV has to be non-repeating and, for some modes, random as well. The initialization vector is used to ensure distinct ciphertexts are produced even when the same plaintext is encrypted multiple times independently with the same key. Block ciphers may be capable of operating on more than one block size, but during transformation the block size is always fixed. Block cipher modes operate on whole blocks and require that the last part of the data be padded to a full block if it is smaller than the current block size. There are, however, modes that do not require padding because they effectively use a block cipher as a stream cipher.

IV,Initialization Vector

In cryptography, an initialization vector (IV) or starting variable (SV) is an input to a cryptographic primitive being used to provide the initial state. The IV is typically required to be random or pseudorandom, but sometimes an IV only needs to be unpredictable or unique. Randomization is crucial for some encryption schemes to achieve semantic security, a property whereby repeated usage of the scheme under the same key does not allow an attacker to infer relationships between (potentially similar) segments of the encrypted message. For block ciphers, the use of an IV is described by the modes of operation.

CTR, Counter mode

Counter mode turns a block cipher into a stream cipher. It generates the next keystream block by encrypting successive values of a "counter". The counter can be any function which produces a sequence which is guaranteed not to repeat for a long time, although an actual increment-by-one counter is the simplest and most popular. The usage of a simple deterministic input function used to be controversial; critics argued that "deliberately exposing a cryptosystem to a known systematic input represents an unnecessary risk". However, today CTR mode is widely accepted, and any problems are considered a weakness of the underlying block cipher, which is expected to be secure regardless of systemic bias in its input. Along with CBC, CTR mode is one of two block cipher modes recommended by Niels Ferguson and Bruce Schneier.

OpenSSL

OpenSSL contains an open-source implementation of the SSL and TLS protocols. The core library, written in the C programming language, implements basic cryptographic functions and provides various utility functions. Wrappers allowing the use of the OpenSSL library in a variety of computer languages are available.

OpenSSL supports a number of different cryptographic algorithms, including AES mentioned above.

Design

Key management

Most of the design and implementation is inspired by Apache Kudu and TiKV, see Kudu data at rest encryption and TiKV encryption, thanks to the two projects!

For Pegasus , overview of the design:

New Configurations

Implementation overview

RocksDB

Encryption file header

Encrypted Env has a fixed length of header, we can define it as 4096 (one page size). The first of 64 bytes are used to store encryption information, including:

char magic[7];         // "encrypt"
uint8_t algorithm[1];  // Encryption algorithm, e.g. AES128/192/256CTR
char file_key[32];     // 32 bytes length of EFK
// char file_key[24];  // reserved

Encryption data

facebook/rocksdb uses ROT13 to encrypt data, it’s just a sample and can not be used in a product environment, we will use AES encryption algorithms.

tikv/rocksdb and Kudu have implemented AES encryption algorithms by using OpenSSL, we will use OpenSSL library as well.

Git repository

Because we are planning to add AES encryption on RocksDB, I guess it would a long journey to merge the modify code into the upstream facebook/rocksdb repository, so I suggest to maintenance Pegasus owned git repository (i.e. https://github.com/pegasus-kv/rocksdb), we can commit the patches to the upstream when the feature is fully tested and stable.

Now Pegasus uses official RocksDB 6.6.4, it’s a chance to upgrade the third-party library to the latest stable version (8.3.2 when write the doc).

Pegasus

Git repository

I'm planning to develop the functionality on the master branch of apache/incubator-pegasus after the 2.5 branch has been created.

Modules updates

native_linux_aio_provider

In fact the native_linux_aio_provider module doesn't use AIO since Pegasus 2.2.0, instead it uses pwrite and pread .

RocksDB uses pwrite and pread too, it's possible to replace the underlying implementation of filesystem of Pegasus by rocksdb::Env .

rocksdb::Env has a plenty of file operation features, includes mmap, direct io, prefetch, preallocate, encryption at rest, and so on, they are public APIs of RocksDB library, and we believe in the stability of RocksDB.

So we will introduce rocksdb::Env to Pegasus as the underlying implementation of filesystem layer.

plog

plog uses native_linux_aio_provider, if native_linux_aio_provider has implemented data at rest encryption, plog has this feature logically.

nfs

The nfs module is used to transfer files (e.g. rocksdb SST files) between replica servers. The files are encrypted if data at rest encryption is enabled, and different replica servers have different SK, so the nfs server side should support to decrypt data when uploading (by using the soure SK), the nfs client side should support to encrypt data when downloading (by using the target SK).

The nfs module uses native_linux_aio_provider too, so it's convenient to support encryption for nfs module.

block service

The block server module is used to backup and restore data, it supports 3 type of targets, including local filesystem, Xiaomi FDS and Apache HDFS. We should also provide the encryption ability of block service to ensure the data security. However, the corresponding SK is needed to be backed up and restored along with the data, the backup SK will be used to decrypt data when downloading in restore stage, and the data will be encrypted again by using the replica server's own SK when writing in restore stage.

logs

User key-values printed in logs should be redacted.

others

Some other modules which read/write files are possible to use rocksdb::Env to refactor as well, e.g. the replica_app_info module.

Roadmap

Prepare the rocksdb repository

Commits are merged to https://github.com/pegasus-kv/rocksdb/tree/v8.3.2-pegasus-encrypt firstly.

Cherry-pick encryption related commits from TiKV

Commits are cherryp-icked from branch https://github.com/tikv/rocksdb/commits/6.29.tikv

Remove the key manager

Implement the self-served file key managment

Update rocksdb to 8.5.3

Other fixes of pegasus-kv/rocksdb

Pegasus use rocksdb::EncryptedEnv when data at rest encryption enabled

Refactor Pegasus to use rocksdb::Env to access other disk files

GiantKing commented 1 year ago
  1. The config unit for encrypt_data_at_rest is cluster? Why not table?
  2. In bulkload process, we generate the underlying data file by spark. So we need to ensure the data parser/generator of RocksDB is running well.
acelyc111 commented 1 year ago

Hi @GiantKing , thanks for your reply!

  1. In the first step, I just want to implement cluster granularity encryption. It would be easy to extend to table granularity encryption.
  2. Sorry I didn't get your key point, does the design break some rules?
kirbyzhou commented 1 year ago

Missing the authentication credentials required to connect KMS.

kirbyzhou commented 1 year ago

Good Question It depends on how RocksDb is distributed among multiple replication server.

One possible solution is that: Spark encrypts FK of each rocksdb with a unified BK, then import them into pegasus with BK. Pegasus use BK to decrypt FK then re-encrypt FK with its own SK and write into the header of rocksdb.

  • In bulkload process, we generate the underlying data file by spark. So we need to ensure the data parser/generator of RocksDB is running well.
acelyc111 commented 1 year ago
  • pegasus-spark only supports to read plaintext data from source, the generated data is in plaintext as well, it doesn't break the security. When load the generated plaintext data into Pegasus, the data will be encrypted if the encrypt_data_at_rest feature is enabled.

I added this as a non-goal.

pegasus-spark only supports to read plaintext data from source, the generated data is in plaintext as well, it doesn't break the security. When load the generated plaintext data into Pegasus, the data will be encrypted if the encrypt_data_at_rest feature is enabled.

acelyc111 commented 1 year ago

Another pull request to facebook/rocksdb, https://github.com/facebook/rocksdb/pull/7020, but it seems not updated near 3 years.