GoogleCloudPlatform / terraform-example-foundation-app

https://registry.terraform.io/modules/GoogleCloudPlatform/terraform-example-foundation-app/google
Apache License 2.0
33 stars 36 forks source link

Cluster does not have enough resources to deploy Bank of Anthos #54

Closed daniel-cit closed 3 years ago

daniel-cit commented 3 years ago

Expected Behavior

After running all steps in the READMEs, Bank of Anthos is deployed in the clusters.

Actual Behavior

Most of the services fails due to lack of resources. e.g.:

pods "ledgerwriter-5df95fb547-4c7d2" is forbidden: exceeded quota: transactions, requested: limits.cpu=2700m, used: limits.cpu=5400m, limited: limits.cpu=6

The cluster cannot be scaled up to more than 4 machines, due to IP_SPACE_EXHAUSTED errors in the instance groups of the node pools

IP_SPACE_EXHAUSTED  gke-gke-1-boa-d-us-east1-np-us-east1-89b4eb38-505x  us-east1-c  Creating    May 17, 2021, 8:44:53 AM UTC-03:00  Instance 'gke-gke-1-boa-d-us-east1-np-us-east1-89b4eb38-505x' creation failed: IP space of 'projects/prj-d-shared-base-d331/regions/us-east1/subnetworks/gke-cluster1-su-pod-ip-range-9682ef0a15e2d26b' is exhausted.

The configured range for pod-ip-range is 100.65.64.0/22 which give us only 4 chunks of 256 address to be used per instance see Pod IP addresses info

Most of the Cluster resources are used by Istio

Steps to Reproduce the Problem

  1. follow all the steps to deploy the foudation-app

Specifications

daniel-cit commented 3 years ago

Documentation related to cluster CIDR sizing

[DROPED PROPOSAL] Proposed change

DEVELOPMENT

Cluster 1 node CIDR

before after
10.0.65.0/29 10.0.65.0/27

Cluster 2 node CIDR

before after
10.1.64.0/29 10.1.64.0/27

MCI (keep value)

secondary range before after
pod-ip-range 100.64.64.0/22 100.64.64.0/22
services-ip-range 100.64.68.0/26 100.64.68.0/26
master_cidr 100.64.70.0/28 100.64.70.0/28

Cluster 1

secondary range before after
pod-ip-range 100.64.72.0/22 100.65.0.0/19
services-ip-range 100.64.76.0/26 100.65.32.0/23
master_cidr 100.64.78.0/28 100.65.34.0/28

Cluster 2

secondary range before after
pod-ip-range 100.65.64.0/22 100.66.0.0/19
services-ip-range 100.65.68.0/26 100.66.32.0/23
master_cidr 100.65.70.0/28 100.66.34.0/28

NON-PRODUCTION

Cluster 1 node CIDR

before after
10.0.129.0/29 10.0.129.0/27

Cluster 2 node CIDR

before after
10.1.128.0/29 10.1.128.0/27

MCI (keep value)

secondary range before after
pod-ip-range 100.64.128.0/22 100.64.128.0/22
services-ip-range 100.64.132.0/26 100.64.132.0/26
master_cidr 100.64.134.0/28 100.64.134.0/28

Cluster 1

secondary range before after
pod-ip-range 100.64.136.0/22 100.65.64.0/19
services-ip-range 100.64.140.0/26 100.65.96.0/23
master_cidr 100.64.142.0/28 100.65.98.0/28

Cluster 2

secondary range before after
pod-ip-range 100.65.128.0/22 100.66.64.0/19
services-ip-range 100.65.132.0/26 100.66.96.0/23
master_cidr 100.65.134.0/28 100.66.98.0/28

PRODUCTION

Cluster 1 node CIDR

before after
10.0.193.0/29 10.0.193.0/27

Cluster 2 node CIDR

before after
10.1.192.0/29 10.1.192.0/27

MCI (keep value)

secondary range before after
pod-ip-range 100.64.192.0/22 100.64.192.0/22
services-ip-range 100.64.196.0/26 100.64.196.0/26
master_cidr 100.64.198.0/28 100.64.198.0/28

Cluster 1

secondary range before after
pod-ip-range 100.64.200.0/22 100.65.128.0/19
services-ip-range 100.64.204.0/26 100.65.160.0/23
master_cidr 100.64.206.0/28 100.65.162.0/28

Cluster 2

secondary range before after
pod-ip-range 100.65.192.0/22 100.66.128.0/19
services-ip-range 100.65.196.0/26 100.66.160.0/23
master_cidr 100.65.198.0/28 100.66.162.0/28
daniel-cit commented 3 years ago

Alternative to changing the CIDR ranges:

Data used

Note: Istio-proxy demands 2 additional CPU limit for each pod it is part of.