a-mango / CLD-Workshop

0 stars 0 forks source link

OpenShift

WORKSHOPTEAMS 2 (Valentin Bugna, Walid Slimani, Aubry Mangold, Simon Guggisberg)

POC objectives

Validate the possible use of OpenShift for deploying and managing a multi-tier architecture in a hybrid cloud context to showcase cross-cluster deployments, scaling, and failover capabilities.

Infra architecture

TODO: Logical components, ports/protocols, cloud type.

Diagram

Scenario

Describe step-by-step the scenario. Write it using this format (BDD style).

Summary

The scenario describes the setup of a multi-cloud OpenShift environment. On-premises infrastructure is used to host a RHEL management workstation, a PostgreSQL database, and an OpenShift cluster. AWS is used to provision an OpenShift cluster and a Route 53 load balancer used for cluster failover. The AWS cluster is set up with 5 nodes and a load balancer to route traffic to the nodes. The on-premises cluster is set up with a single OpenShift node.

The open source collaboration platform Mattermost is used to showcase a 2-tier architecture. It consists of an application and a database. The images are stored in a container registry. Route 53 is used for DNS-based failover routing. The application is tested for functionality, load, and failover scenarios.

Note: the on-premises infrastructure is simulated using either a local hypervisor (on a laptop at school or in one of the team member's homelab) or a cloud-based hypervisor.


Feature 1: Cluster Setup

Task 1: Set Up RHEL Management Workstation on On-Premises Infrastructure

Task 2: Provision AWS Instances for OpenShift

Task 3: Set up Load Balancer for AWS Cluster

Task 4: Set up OpenShift Cluster on AWS

Task 5: Provision On-Premises Infrastructure for OpenShift

Task 6: Set up OpenShift Cluster On-Premises


Feature 2: Multi-Tier Application Setup

Task 1: Set Up PostgreSQL on RDS

Task 2: Build and Store Application Images

Task 3: Create BuildConfig in OpenShift

Task 4: Deploy Application Tiers to All Clusters

Task 5: Configure Route 53 for DNS-Based Failover


Feature 3: Testing and Validation

Task 1: Verify Application Functionality on AWS

Task 2: Verify Application Functionality On-Premises

Task 3: Perform Load Testing Across Clusters

Task 4: Validate Cross-Cluster Failover

Task 5: Test Autoscaling

Cost Analysis

TODO: analysis of load-related costs.

This analysis covers the cost components involved in setting up a multi-cloud OpenShift environment using on-premises infrastructure and AWS. Because we couldn't set up Mattermost, we can't provide a precise cost analysis that includes autoscaling. But because we're using an OpenShift cluster, the cost of autoscaling would be based on the number of instances the cluster uses.

The instances and resources used in the AWS cluster are the following:

Instance Type vCPUs RAM (GiB) Cost per Hour Number of Instances
m5.2xlarge 8 32 $0.384 3
m5a.xlarge 4 16 $0.172 2
r5.xlarge 4 32 $0.376 2

Total Specs of EC2 Instances

We'll estimate the cost of the Proxmox server based on the vCPU and RAM used by the AWS instances. A server, or rather multiple servers with a capacity of 192 GiB of RAM and 40 vCPUs would equate to 3 high-end servers with 64 GiB of RAM and 16 vCPUs each. Assuming we're buying new HPE ProLiant Gen11 servers with appropriate specs, the cost would be around $4,200 per server. We'll assume we're buying 3 servers for a total of $12,600.

Cost Breakdown

1. On-Premises Infrastructure

2. AWS Infrastructure

Total Monthly and Upfront Costs

Cost Component Upfront Cost Monthly Cost
Proxmox Servers $12,600 $0
Internet and Electricity Costs $0 $800
AWS EC2 Instances $0 $1,618.56
AWS Elastic Load Balancer $0 $32.76
AWS Elastic IPs (2) $0 $7.20
AWS RDS for PostgreSQL $0 $29.95
AWS Route 53 $0 $0.90
Total $12,600 $2489.37

Other Considerations

Cost Reduction Strategies

TODO: option to reduce or adapt costs (practices, subscription)

To reduce costs, the following strategies could be considered:

Return of experience

TODO: take a position on the poc that has been produced.

The first feature of the scenario Feature 1: Cluster Setup was successfully implemented. After reading the documentation and gathering the necessary IAM roles thanks to our administrator, we were able to provision our AWS infrastructure. The provisioning of the on-premises OpenShift infrastructure was more complicated because of limited resources compared to AWS. The local hypervisor was limited to 64GB of RAM and 16 vCPUs, which was not enough to run the OpenShift cluster installer. In the end, we resorted to using a single-node cluster as described by RedHat.

Due to technical limitations of the on-premises environment (namely the unavailability of certains ports), we had to set up a reverse proxy to access the OpenShift console and applications. Due to SSL issues, we were unable to access the console and applications from the WAN (we always get 502 Bad Gateway errors). As a workaround, we've set up a VPN that allows us to interact with the on-premises server. The AWS cluster is accessible from the WAN.

The second feature of the scenario Feature 2: Multi-Tier Application Setup was not implemented due to the accumulation of errors and the lack of time. We were able to set up the PostgreSQL database on RDS correctly and set up an Openshift image on GitHub. We had trouble storing the application images in the container registry, and accessing our volumes. Ultimately, the pods are stuck in Init:CrashLoopBackOff state and we were unable to access the application.

Because of this, we were unable to test the application's functionality, perform load testing, or validate cross-cluster failover. We were also unable to test autoscaling. The lack of a working application prevented us from validating the failover mechanism and autoscaling.

The DNS configuration was set up on Route 53 for both clusters. Because the application was not working, we were unable to set up the failover mechanism.

Both clusters were linked in the Red-Hat Hybrid Cloud Console.

The last feature of the scenario, namely testing, wasn't done due to the lack of a working application. We were unable to verify the application's functionality on AWS or on-premises, perform load testing, or validate cross-cluster failover. We were also unable to test autoscaling.

The proof-of-concept validates that it is possible to setup two independent OpenShift clusters on-premises and in AWS, but not that we can do HA between them using Route53. We were unable to validate the application deployment and the failover mechanism due to technical issues. The cost analysis was done based on the resources used in the scenario.

TODO: did it validate the announced objectives?

The proof of concept did not validate the entirety of the objectives because of lack of time and many unforseen problems. The application was not working, which prevented us from testing the failover mechanism and autoscaling. The lack of a working application also prevented us from testing the application's functionality and performing load testing. It did however validate the setup of two independent OpenShift clusters on-premises and in AWS.