The Howard project, named after "The Godfather of Clouds" Luke Howard, orchestrates the Kubernetes-based cloud infrastructure for the Canadian Food Inspection Agency's AI lab, managing applications like Nachet, Finesse, and Louis. It prioritizes robustness, security and efficiency
The current Kubernetes cluster setup lacks the flexibility to switch node pool types, which limits our capabilities for resource-intensive computing requirements. This issue proposes the creation of a secondary Kubernetes cluster in Azure specifically tailored with a node pool that include GPU capabilities, aimed at optimizing computational resources for AI-based projects.
Context
Our existing Kubernetes infrastructure does not support changes to the node pool configuration after initial setup, which restricts our ability to adapt to evolving project needs. The primary requirement for the new cluster is to support advanced computational tasks which involve heavy AI and machine learning workloads. These tasks require significantly higher computational power, including the use of GPUs. By leveraging Istio, which is natively supported in Azure Kubernetes Service (AKS), we aim to implement a multi-cluster mesh that enhances connectivity and management ease across our clusters.
TODO
[x] #204
[x] #205
Deploy the cluster within Azure using Terraform
Select and configure the appropriate node pool according to identified requirements.
[x] #211
[x] #206
Install Istio on the new Kubernetes cluster.
Configure Istio to enable seamless communication and management between the two clusters.
[x] #217
Create 1 instance of the ollama deployment on the cluster that contains GPUs
Create 1 deployment of openweb-ui on the cluster that does not have GPUs
Connect openweb-ui with ollama by changing this environment variable OLLAMA_BASE_URL
[x] #207
Set up node labels and taints to organize nodes effectively based on their capacities and intended usage.
Use Kubernetes affinity and anti-affinity rules to ensure optimal allocation and scheduling of workloads.
[x] #208
Conduct tests to ensure the new cluster and its node pools are configured correctly.
Run AI-based computational tasks to validate the performance enhancements achieved with the new setup.
[x] #209
Document the entire setup and configuration process.
Provide training and support to team members to adapt to the new Kubernetes environment.
Executive summary
The current Kubernetes cluster setup lacks the flexibility to switch node pool types, which limits our capabilities for resource-intensive computing requirements. This issue proposes the creation of a secondary Kubernetes cluster in Azure specifically tailored with a node pool that include GPU capabilities, aimed at optimizing computational resources for AI-based projects.
Context
Our existing Kubernetes infrastructure does not support changes to the node pool configuration after initial setup, which restricts our ability to adapt to evolving project needs. The primary requirement for the new cluster is to support advanced computational tasks which involve heavy AI and machine learning workloads. These tasks require significantly higher computational power, including the use of GPUs. By leveraging Istio, which is natively supported in Azure Kubernetes Service (AKS), we aim to implement a multi-cluster mesh that enhances connectivity and management ease across our clusters.
TODO
OLLAMA_BASE_URL
References
Istio multicluster mesh Azure itsio service mesh AKS GPU workloads