elastic / elasticsearch

Free and Open Source, Distributed, RESTful Search Engine
https://www.elastic.co/products/elasticsearch
Other
69.73k stars 24.68k forks source link

HDFS Repository should support secured HDFS Encryption #71698

Open jbaiera opened 3 years ago

jbaiera commented 3 years ago

The HDFS Repository currently breaks when attempting to interact with encrypted files from a secured HDFS.

HDFS implements encrypted zones in the filesystem by denoting a key that must be used to decrypt and encrypt the data on the client side before it is sent to the server to store. This means that the data is never held on HDFS unencrypted. Hadoop provides a Key Management Server (KMS) to orchestrate the management of encryption keys on all clients. Client code connects and authenticates to the KMS over HTTPS/SPNEGO in order to obtain keys. The resulting file stream from HDFS is then encrypted/decrypted on the client.

The root of the problem is related to the management of user credentials in the HDFS client when connecting to the KMS. Kerberos creds are stored in the user Subject that is captured by the HDFS client for subsequent operations. This Subject object is used when connecting to Namenode or Datanode in order to perform the GSS dance to authenticate, but it is not used when connecting to the KMS. The KMS client assumes that the thread is already operating as the Subject in question.

We should be explicitly assuming the identity of the authenticated principal when interacting with the HDFS client instead of relying on the client to do it for us. Additionally, in order to officially support the usage of encryption, we should expand our tests to include testing on an HDFS instance encrypted with a KMS.

elasticmachine commented 3 years ago

Pinging @elastic/es-distributed (Team:Distributed)

elasticmachine commented 3 years ago

Pinging @elastic/es-core-features (Team:Core/Features)

elasticsearchmachine commented 10 months ago

Pinging @elastic/es-data-management (Team:Data Management)