SovereignCloudStack / issues

This repository is used for issues that are cross-repository or not bound to a specific repository.
https://github.com/orgs/SovereignCloudStack/projects/6
2 stars 1 forks source link

Implementation of E2E encryption between OpenStack services in upstream ansible Kolla #462

Open fdobrovolny opened 10 months ago

fdobrovolny commented 10 months ago

Update 31.5.2024: current state: https://scs.sovereignit.de/nextcloud/s/G4DzznMKeJHqMBx

Introduction

OpenStack is a platform used for the SCS IaaS layer (Infrastructure as a Service). Currently, the deployment is mostly done using Kolla. This EPIC will implement end-to-end TLS encryption of selected OpenStack services shown below.

Services

This epic will focus on the following services as they are the default ones in our IaaS layer.

  1. Keystone - Keystone is the identity service in OpenStack, responsible for user authentication and authorization and managing domains, projects, and roles.

  2. Horizon - Horizon is the dashboard of OpenStack, providing a web-based user interface to other services, allowing administrators and users to manage resources and services.

  3. Glance - Glance is the image service in OpenStack, responsible for discovering, registering, and retrieving virtual machine images.

  4. Cinder - Cinder is the block storage service, providing persistent block storage resources for virtual machine instances.

  5. Placement - Placement is a service for tracking resource provider inventories and usages, helping in the efficient allocation of resources across the cluster.

  6. Nova - Nova is the compute service in OpenStack, responsible for creating and managing virtual machines and compute instances.

  7. Neutron - Neutron is the networking service providing network connectivity between interface devices managed by other OpenStack services.

  8. Heat - Heat is the orchestration service, allowing developers to define and deploy composite cloud applications using templates.

  9. Memcached - Memcached is a caching layer in OpenStack, improving performance by alleviating database load.

  10. MariaDB - MariaDB is the database backend for OpenStack services, storing and managing data efficiently.

  11. RabbitMQ - RabbitMQ is the messaging service used in OpenStack for communication between the different components and services.

  12. Redis - Redis is a database, cache, and message broker within OpenStack, ensuring high-performance data management.

  13. OpenvSwitch - OpenvSwitch is a multilayer virtual switch implemented to ensure effective network automation in an OpenStack environment.

  14. libvirt - Used for virtualization

  15. Octavia - Load balancing service

  16. CloudKitty - Billing service

  17. Barbican - storage of encryption keys

  18. Designate - DNS as a service

Current state

Non-internal OpenStack endpoints

OpenStack API: External client
   EC            --|TLS|--         HA          --|HTTP|--               OS
(External)   (public internet) (HAProxy) (OpenStack Networking)   (OpenStack Services)
   |                                                                  |      |      |
 Client                                                           Keystone  Glance  Placement ...

The communication between external clients and the OpenStack services is crucial. It needs to be secured to prevent data leakage and unauthorized access as this data often travels across the open internet, is highly susceptible to interception, and has considerable potential for misuse.

Kolla, out of the box, provides the ability to encrypt external traffic using a HAProxy. Details

OpenStack API: Internal client
   IC            --|TLS|--         HA          --|HTTP|--               OS
(Internal) (OpenStack Networking) (HAProxy) (OpenStack Networking)   (OpenStack Services)
   |                                                                  |      |      |
 Client                                                          Keystone  Glance  Placement ...

The communication from within the OpenStack is the second most vulnerable communication. This kind of communication usually serves to manage resources from CI/CD tools or services such as the SCS KaaS layer. For the attacker to gain access, they can intercept this kind of communication, for example, from a compromised ci build.

Kolla, out of the box, can encrypt internal traffic using a HAProxy. Details

HAProxy to backend services
   EC            --|TLS|--         HA          --|TLS|--                   OS
(External)   (public internet) (HAProxy) (OpenStack Networking)   (OpenStack Services)
   |                                                                  |      |      |
 Client                                                           Keystone  Glance  Placement ...

The communication from HAproxy to respective services is much harder to intercept as this traffic is internal to OpenStack; however, here, an attacker is possible from one compromised service to intercept all API traffic.

Kolla enables end-to-end TLS encryption from HAProxy to services that support TLS Termination, currently out of the services of interest, the following services are supported:

Only RabbitMQ is not currently supported.

Details

Backend services to their data services

   EC            --|TLS|--         HA          --|TLS|--                   OS             --|TLS|--                  DS
(External)   (public internet) (HAProxy) (OpenStack Networking)   (OpenStack Services)  (OpenStack Networking)  (Data Service)
   |                                                                  |      |      |                            |         |
 Client                                                           Keystone  Glance  Placement ...               MariaDB   Redis ...

To provide truly end-to-end traffic encryption within our infrastructure, all communications between services and their respective databases and brokers must be conducted via TLS (Transport Layer Security). This measure not only encrypts the data in transit but also ensures the communication channels are authenticated and integrity-protected.

Memcached

Memcached does support TLS; however, this support is only experimental, and custom build binaries are needed. For our use case, this is a no-go.

MariaDB

MariaDB has production-ready support for TLS encryption, and Galera Clusters does also support TLS encryption between nodes. Details, Galera Details

Kolla does not currently support the installation of MariaDB with TLS enabled.

RabbitMQ

RabbitMQ does support TLS encryption for connection as well as for inter-node communication. Details

Kolla currently supports encryption of:

And does not support encryption of:

Details

Redis

Redis supports TLS encryption for client and inter-node communication; however, this has to be enabled during the build and is not feature-enabled by default. Details

Kolla currently does not support TLS encryption.

Note: We had issues with oslo.cache connecting via TLS to redis, minor upstream change might be needed.

Motivation

As a part of our ongoing efforts to maintain high-security standards, it's crucial to encrypt the OpenStack API and OpenStack internal communication.

Encrypting this internal traffic adds a strong layer of security. It helps protect our data and operations from potential internal threats. Even if unauthorized individuals gain access to our network, the encrypted communication ensures they cannot understand or misuse the shared data across different services within the cluster.

The purpose of this Decision Record is to outline why we need to encrypt Internal traffic, what solutions are proposed, and what solution we decide to implement for our OpenStack clusters. We will explain the technical considerations, expected benefits, and possible challenges tied to this initiative. as well as a summary of the present status of various services within our ecosystem. This document provides a clear and detailed account of our decision-making process, serving as a helpful reference for any similar security enhancement efforts in the future.

Proposal for changes in Kolla

In light of the analysis of the current state of TLS within our Kolla-OpenStack environment, several steps are proposed to enhance the security of the Internal traffic within our SCS OpenStack clusters.

Non-internal OpenStack endpoints

OpenStack API: External client :white_check_mark:

No action is needed.

OpenStack API: Internal client :white_check_mark:

No action is needed.

HAProxy to backend services :paperclip:

Implement TLS between HAProxy and RabbitMQ.

Backend services to their data services

Caching / Memcached :construction:

Considering the experimental nature of TLS support in Memcached, we recommend transitioning to Redis as the caching layer due to its more reliable TLS encryption support. Since Redis can effectively replace all the functions of Memcached, and given that the Redis setup is already in place within Kolla, we propose replacing Memcached with Redis.

Database / MariaDB :paperclip:

Our proposal involves enhancing Kolla to include support for the following:

Message Broker / RabbitMQ :paperclip:

Since RabbitMQ already has partial TLS support in Kolla, we recommend completing the support by adding the following functionalities:

Caching / Redis :paperclip:

We propose enhancing Kolla by introducing comprehensive support for Redis TLS.

Libvirt :white_check_mark:

Mentioned by @artificial-intelligence

It is possible to use TLS for communication between nova and libvirt. Libvirt already enables the ability to use SSL for live migrations; however, Ansible Kolla misses a way to enable this.

https://github.com/SovereignCloudStack/standards/pull/370#discussion_r1409047854

https://docs.openstack.org/kolla-ansible/latest/reference/compute/libvirt-guide.html. https://docs.openstack.org/nova/latest/admin/secure-live-migration-with-qemu-native-tls.html https://github.com/openstack/kolla-ansible/blob/a3f3dc7ab5e1bed82bee9a0a8563e0e812e90b6c/ansible/roles/nova-cell/templates/libvirtd.conf.j2#L4

UPDATE: https://github.com/SovereignCloudStack/issues/issues/533 - Christian pointed out to my mistake I was looking in a wrong config ansible-kolla does already provide this functionality

Octavia :paperclip:

Octavia already requires SSL certificates for their primary function, and Ansible Kolla generates them automatically. However, communication with backend services is not utilized. Also, the API of Octavia is exposed via HAProxy.

Octavia uses:

https://docs.openstack.org/kolla-ansible/latest/reference/networking/octavia.html https://github.com/openstack/kolla-ansible/blob/a3f3dc7ab5e1bed82bee9a0a8563e0e812e90b6c/ansible/roles/octavia/templates/octavia.conf.j2 https://docs.openstack.org/octavia/latest/install/install-ubuntu.html

CloudKitty :paperclip:

CloudKitty uses:

Barbican :paperclip:

Barbican uses:

Designate :paperclip:

Designate uses:

Definition of Ready:

Definition of Done:

artificial-intelligence commented 8 months ago

I still miss an answer to my questions in https://github.com/SovereignCloudStack/standards/pull/370#issuecomment-1835766194

specifically the question if actually each openstack service was investigated or on what other basis it was decided which services need TLS security?

Because I still see gaps here and I see no explanation why these gaps are there, e.g. libvirt live migration was not mentioned initially, or cloudkitty, for example.

I just remembered these services from the top of my head, so it might very well be that we miss more services.

Or is the goal of this issue not to implement E2E encryption between all Openstack Services? If this is the case it would be good to mention this somewhere, at least it's not clear to me from the current proposals/issues.

horazont commented 8 months ago

FTR: We have seen issues with reloading TLS certificates on the galera/replication endpoints of MySQL/Galera without a restart of the node. So plan for rolling restarts of the galera nodes once you enable TLS on the galera replication endpoints.

There is also anecdotal evidence of weird behaviour with TLS and RabbitMQ (spurious "certificate has expired" messages), but that was not reproducible outside one specific environment.

fdobrovolny commented 8 months ago

There will be a breakout session to discuss this topic. Please vote on the best suiting time. The voting closes today at 13:00 CEST the result will be announced here.

https://dud-poll.inf.tu-dresden.de/EZjR7-W7fA/

fdobrovolny commented 8 months ago

There will be a breakout session to discuss this topic. Please vote on the best suiting time. The voting closes today at 13:00 CEST the result will be announced here.

https://dud-poll.inf.tu-dresden.de/EZjR7-W7fA/

  • @berendt
  • @josephineSei
  • @MatusJenca2
  • @ignatov17
  • @bitkeks
  • @artificial-intelligence
  • @horazont

Friday 14:00 CEST has been selected we will meet on the following link:

https://conf.scs.koeln:8443/SCS-Tech

fkr commented 8 months ago

@fdobrovolny feel free to add a one-time event in the public calendar.

fdobrovolny commented 8 months ago

Questions:

Source: https://docs.openstack.org/oslo.cache/latest/configuration/index.html#cache.backend

For eventlet-based or environments with hundreds of threaded servers, Memcache with pooling (oslo_cache.memcache_pool) is recommended. For environments with less than 100 threaded servers, Memcached (dogpile.cache.memcached) or Redis (dogpile.cache.redis) is recommended.

fdobrovolny commented 6 months ago

There will be a breakout session to discuss this topic. Please vote on the best suiting time. The voting closes today at 13:00 CEST the result will be announced here. https://dud-poll.inf.tu-dresden.de/EZjR7-W7fA/

  • @berendt
  • @josephineSei
  • @MatusJenca2
  • @ignatov17
  • @bitkeks
  • @artificial-intelligence
  • @horazont

Friday 14:00 CEST has been selected we will meet on the following link:

https://conf.scs.koeln:8443/SCS-Tech

Out of the meeting above, the following output was reached. We need to take a look at these additional services:

The description of this task was updated to that fact, so far, the state is the following:

fdobrovolny commented 6 months ago

Continuing my previous comment.

@berendt Raised a comment if we really want to support CloudKitty https://github.com/SovereignCloudStack/issues/issues/562#issuecomment-1994333903

CloudKitty

fdobrovolny commented 6 months ago

Continuing

Barbican:

fdobrovolny commented 6 months ago

Designate:

OgarOgarovic commented 3 months ago

This is the current state of implementation of encryption in kolla-ansible: https://scs.sovereignit.de/nextcloud/s/G4DzznMKeJHqMBx