Open fmichaelobrien opened 4 months ago
Testing US clusters now from
michael@cloudshell:~$ gcloud config set project kcc-timeout-cso
Updated property [core/project].
michael@cloudshell:~ (kcc-timeout-cso)$
export CLUSTER=kcc
export REGION=us-central-1
export NETWORK=kcc-vpc
export SUBNET=kcc-sn
export CIDR_KCC_VPC=192.168.0.0/16
gcloud services enable krmapihosting.googleapis.com
gcloud services enable container.googleapis.com
gcloud services enable cloudresourcemanager.googleapis.com
gcloud services enable accesscontextmanager.googleapis.com
gcloud services enable cloudbilling.googleapis.com
gcloud services enable serviceusage.googleapis.com
gcloud services enable servicedirectory.googleapis.com
gcloud services enable dns.googleapis.com
gcloud services enable anthos.googleapis.com
# na only
gcloud compute networks create "$NETWORK" --subnet-mode=custom
# all of google
gcloud compute networks create "default"
fix
- Location REGION:us-east4 violates constraint constraints/gcp.resourceLocations on the resource projects/kcc-timeout-cso/regions/us-east4/subnetworks/kcc-us-east4-sn.
https://console.cloud.google.com/iam-admin/orgpolicies/gcp-resourceLocations?orgonly=true&project=kcc-timeout-cso&supportedpurview=organizationId
gcloud compute networks subnets create "kcc-us-east4-sn" --network "$NETWORK" --range "$CIDR_KCC_VPC" --region "us-east4" --stack-type=IPV4_ONLY
gcloud anthos config controller create "$CLUSTER" --location "$REGION" --network "$NETWORK" --subnet "$SUBNET" --master-ipv4-cidr-block="172.16.0.128/28" --full-management
at the org level - but kcc will revert it
takes 5 min to propagate
michael@cloudshell:~ (kcc-timeout-cso)$ gcloud compute networks subnets create "kcc-us-east4-sn" --network "$NETWORK" --range "$CIDR_KCC_VPC" --region "us-east4" --stack-type=IPV4_ONLY
Created [https://www.googleapis.com/compute/v1/projects/kcc-timeout-cso/regions/us-east4/subnetworks/kcc-us-east4-sn].
NAME: kcc-us-east4-sn
REGION: us-east4
NETWORK: kcc-vpc
RANGE: 192.168.0.0/16
STACK_TYPE: IPV4_ONLY
IPV6_ACCESS_TYPE:
INTERNAL_IPV6_PREFIX:
EXTERNAL_IPV6_PREFIX:
michael@cloudshell:~ (kcc-timeout-cso)$ gcloud anthos config controller create "$CLUSTER" --location "us-east4" --network "kcc-vpc" --subnet "kcc-us-east4-sn" --master-ipv4-cidr-block="172.16.0.128/28" --full-management
Create request issued for: [kcc]
Waiting for operation [projects/kcc-timeout-cso/locations/us-east4/operations/operation-1709056068060-612609fd68669-b51f242c-8b2a2484] to complete...working..
1245
forgot to override the peering constraint
ERROR: (gcloud.anthos.config.controller.create) unexpected error occurred while waiting for SLM operation [projects/krmapihosting-slm/locations/us-east4/operations/operation-1709056076620-61260a05921ae-b6751cf1-af711ba3]: errored while waiting for operation: projects/krmapihosting-slm/locations/us-east4/operations/operation-1709056076620-61260a05921ae-b6751cf1-af711ba3: Operation failed with error:
generic::invalid_argument: terraform apply failed, error: exit status 1, stderr:
Error: Error waiting for creating GKE cluster: Constraint constraints/compute.restrictVpcPeering violated for project 59969913664. Peering the network projects/gke-prod-us-east4-0839/global/networks/gke-ncca9b3ff9ac9f6c6986-8201-7368-net is not allowed.
on main_autopilot.tf line 32, in resource "google_container_cluster" "acp_cluster":
32: resource "google_container_cluster" "acp_cluster" {
, stdout:
google_container_cluster.acp_cluster: Creating...
google_container_cluster.acp_cluster: Still creating... [10s elapsed]
google_container_cluster.acp_cluster: Still creating... [20s elapsed]
google_container_cluster.acp_cluster: Still creating... [30s elapsed]
google_container_cluster.acp_cluster: Still creating... [40s elapsed]
google_container_cluster.acp_cluster: Still creating... [50s elapsed]
google_container_cluster.acp_cluster: Still creating... [1m0s elapsed]
google_container_cluster.acp_cluster: Still creating... [1m10s elapsed]
google_container_cluster.acp_cluster: Still creating... [1m20s elapsed]
google_container_cluster.acp_cluster: Still creating... [1m30s elapsed]
google_container_cluster.acp_cluster: Still creating... [1m40s elapsed]
google_container_cluster.acp_cluster: Still creating... [1m50s elapsed]
google_container_cluster.acp_cluster: Still creating... [2m0s elapsed]
google_container_cluster.acp_cluster: Still creating... [2m10s elapsed]
google_container_cluster.acp_cluster: Still creating... [2m20s elapsed]
google_container_cluster.acp_cluster: Still creating... [2m30s elapsed]
google_container_cluster.acp_cluster: Still creating... [2m40s elapsed]
google_container_cluster.acp_cluster: Still creating... [2m50s elapsed]
google_container_cluster.acp_cluster: Still creating... [3m0s elapsed]
google_container_cluster.acp_cluster: Still creating... [3m10s elapsed]
google_container_cluster.acp_cluster: Still creating... [3m20s elapsed]
google_container_cluster.acp_cluster: Still creating... [3m30s elapsed]
google_container_cluster.acp_cluster: Still creating... [3m40s elapsed]
google_container_cluster.acp_cluster: Still creating... [3m50s elapsed]
google_container_cluster.acp_cluster: Still creating... [4m0s elapsed]
google_container_cluster.acp_cluster: Still creating... [4m10s elapsed]
google_container_cluster.acp_cluster: Still creating... [4m20s elapsed]
google_container_cluster.acp_cluster: Still creating... [4m30s elapsed]
google_container_cluster.acp_cluster: Still creating... [4m40s elapsed]
google_container_cluster.acp_cluster: Still creating... [4m50s elapsed]
google_container_cluster.acp_cluster: Still creating... [5m0s elapsed]
google_container_cluster.acp_cluster: Still creating... [5m10s elapsed]
google_container_cluster.acp_cluster: Still creating... [5m20s elapsed]
google_container_cluster.acp_cluster: Still creating... [5m30s elapsed]
google_container_cluster.acp_cluster: Still creating... [5m40s elapsed]
google_container_cluster.acp_cluster: Still creating... [5m50s elapsed]
google_container_cluster.acp_cluster: Still creating... [6m0s elapsed]
google_container_cluster.acp_cluster: Still creating... [6m10s elapsed]
google_container_cluster.acp_cluster: Still creating... [6m20s elapsed]
google_container_cluster.acp_cluster: Still creating... [6m30s elapsed]
google_container_cluster.acp_cluster: Still creating... [6m40s elapsed]
google_container_cluster.acp_cluster: Still creating... [6m50s elapsed]
google_container_cluster.acp_cluster: Still creating... [7m0s elapsed]
google_container_cluster.acp_cluster: Still creating... [7m10s elapsed]
google_container_cluster.acp_cluster: Still creating... [7m20s elapsed]
google_container_cluster.acp_cluster: Still creating... [7m30s elapsed]
google_container_cluster.acp_cluster: Still creating... [7m40s elapsed]
google_container_cluster.acp_cluster: Still creating... [7m50s elapsed]
google_container_cluster.acp_cluster: Still creating... [8m0s elapsed]
google_container_cluster.acp_cluster: Still creating... [8m10s elapsed]
google_container_cluster.acp_cluster: Still creating... [8m20s elapsed]
google_container_cluster.acp_cluster: Still creating... [8m30s elapsed]
google_container_cluster.acp_cluster: Still creating... [8m40s elapsed]
Subsequent cleanup succeeded
1303
michael@cloudshell:~ (kcc-timeout-cso)$ gcloud anthos config controller create "$CLUSTER" --location "us-east4" --network "kcc-vpc" --subnet "kcc-us-east4-sn" --master-ipv4-cidr-block="172.16.0.128/28" --full-management
Create request issued for: [kcc]
Waiting for operation [projects/kcc-timeout-cso/locations/us-east4/operations/operation-1709056987983-61260d6ab703e-3f497a08-b07ceb66] to complete...working..
33%
1306 55%
need to get to 83%
at 87% - 15 workloads will populate 1312 at 83%
1313 up - 10 min duration
PodUnschedulable
Reason
Cannot schedule pods: node(s) had untolerated taint {cloud.google.com/gke-quick-remove: true}.
[Learn more ](https://cloud.google.com/kubernetes-engine/docs/troubleshooting#PodUnschedulable)
Source
[bootstrap-6dbc584955-9j2v7](https://console.cloud.google.com/kubernetes/pod/us-east4/krmapihost-kcc/krmapihosting-system/bootstrap-6dbc584955-9j2v7?project=kcc-timeout-cso&supportedpurview=project)
script will auto delete shortly
michael@cloudshell:~ (kcc-timeout-cso)$ gcloud anthos config controller create "$CLUSTER" --location "us-east4" --network "kcc-vpc" --subnet "kcc-us-east4-sn" --master-ipv4-cidr-block="172.16.0.128/28" --full-management
Create request issued for: [kcc]
Waiting for operation [projects/kcc-timeout-cso/locations/us-east4/operations/operation-1709056987983-61260d6ab703e-3f497a08-b07ceb66] to complete...working...
Waiting for operation [projects/kcc-timeout-cso/locations/us-east4/operations/operation-1709056987983-61260d6ab703e-3f497a08-b07ceb66] to complete...working..
Waiting for operation [projects/kcc-timeout-cso/locations/us-east4/operations/operation-1709056987983-61260d6ab703e-3f497a08-b07ceb66] to complete...working
Waiting for operation [projects/kcc-timeout-cso/locations/us-east4/operations/operation-1709056987983-61260d6ab703e-3f497a08-b07ceb66] to complete...working
1315
getting better on workloads - was red heering on pos scheduling
6 G at 1316
1317
1319 : 8 of 15
1320: 11 of 15
1321:
1322: 12 of 15
Created instance [kcc].
Fetching cluster endpoint and auth data.
kubeconfig entry generated for krmapihost-kcc.
michael@cloudshell:~ (kcc-timeout-cso)$
1322 14 of 15
bootstrap | OK | Deployment | 1/1 | krmapihosting-system | krmapihost-kcc | ||
---|---|---|---|---|---|---|---|
cnrm-controller-manager-fbgrg35rhrhz7f5czo3a | OK | Stateful Set | 1/1 | cnrm-system | krmapihost-kcc | ||
cnrm-deletiondefender | OK | Stateful Set | 1/1 | cnrm-system | krmapihost-kcc | ||
cnrm-resource-stats-recorder | OK | Deployment | 1/1 | cnrm-system | krmapihost-kcc | ||
cnrm-unmanaged-detector | OK | Stateful Set | 1/1 | cnrm-system | krmapihost-kcc | ||
cnrm-webhook-manager | Does not have minimum availability | Deployment | 2/2 | cnrm-system | krmapihost-kcc | ||
config-management-operator | OK | Deployment | 1/1 | config-management-system | krmapihost-kcc | ||
configconnector-operator | OK | Stateful Set | 1/1 | configconnector-operator-system | krmapihost-kcc | ||
configsync-healthcheck-service | OK | Deployment | 1/1 | configsync-healthcheck-system | krmapihost-kcc | ||
gatekeeper-audit | OK | Deployment | 1/1 | gatekeeper-system | krmapihost-kcc | ||
gatekeeper-controller-manager | OK | Deployment | 1/1 | gatekeeper-system | krmapihost-kcc | ||
krmapihosting-metrics-agent | OK | Daemon Set | 3/3 | krmapihosting-monitoring | krmapihost-kcc | ||
otel-collector | OK | Deployment | 1/1 | config-management-monitoring | krmapihost-kcc | ||
reconciler-manager | OK | Deployment | 1/1 | config-management-system | krmapihost-kcc | ||
resource-group-controller-manager | OK | Deployment | 1/1 | resource-group-system | krmapihost-kcc |
bootstrap OK Deployment 1/1 krmapihosting-system krmapihost-kcc cnrm-controller-manager-fbgrg35rhrhz7f5czo3a OK Stateful Set 1/1 cnrm-system krmapihost-kcc cnrm-deletiondefender OK Stateful Set 1/1 cnrm-system krmapihost-kcc cnrm-resource-stats-recorder OK Deployment 1/1 cnrm-system krmapihost-kcc cnrm-unmanaged-detector OK Stateful Set 1/1 cnrm-system krmapihost-kcc cnrm-webhook-manager Does not have minimum availability Deployment 2/2 cnrm-system krmapihost-kcc config-management-operator OK Deployment 1/1 config-management-system krmapihost-kcc configconnector-operator OK Stateful Set 1/1 configconnector-operator-system krmapihost-kcc configsync-healthcheck-service OK Deployment 1/1 configsync-healthcheck-system krmapihost-kcc gatekeeper-audit OK Deployment 1/1 gatekeeper-system krmapihost-kcc gatekeeper-controller-manager OK Deployment 1/1 gatekeeper-system krmapihost-kcc krmapihosting-metrics-agent OK Daemon Set 3/3 krmapihosting-monitoring krmapihost-kcc otel-collector OK Deployment 1/1 config-management-monitoring krmapihost-kcc reconciler-manager OK Deployment 1/1 config-management-system krmapihost-kcc resource-group-controller-manager OK Deployment 1/1 resource-group-system krmapihost-kcc
1324: 15 of 15
total time 21 min
michael@cloudshell:~ (kcc-timeout-cso)$ gcloud anthos config controller list
NAME: kcc
LOCATION: us-east4
STATE: RUNNING
see either manual or scripted GKE cluster creation https://github.com/GoogleCloudPlatform/pubsec-declarative-toolkit/blob/gh766-script/solutions/setup.sh#L198C1-L198C174
gcloud anthos config controller create "$CLUSTER" --location "$REGION" --network "$NETWORK" --subnet "$SUBNET" --master-ipv4-cidr-block="172.16.0.128/28" --full-management
remove --full-management
as in https://github.com/GoogleCloudPlatform/pubsec-declarative-toolkit/blob/main/docs/advanced-install.md#gke-autopilot---recommended Getting older jira https://github.com/GoogleCloudPlatform/pubsec-declarative-toolkit/issues/464
Client still has issues with standard GKE cluster
Reproducing on one of my orgs...