GoogleCloudPlatform / pubsec-declarative-toolkit

The GCP PubSec Declarative Toolkit is a collection of declarative solutions to help you on your Journey to Google Cloud. Solutions are designed using Config Connector and deployed using Config Controller.
Apache License 2.0
30 stars 26 forks source link

Docs: add triage/fix scenario to wiki for when a krm service such as enabling monitoring in client-landing-zone times out intermittently - during reconcile and requires a re kpt apply to allow the dependency tree to continue #865

Open fmichaelobrien opened 4 months ago

fmichaelobrien commented 4 months ago

Both myself and a customer ran into this one requiring an out-of-band fix - periodically (one only one of my recent 2 orgs

example of a working project with the permission working

michael@cloudshell:~ (client-project-cso3)$ gcloud services list --enabled | grep NAME
NAME: cloudbilling.googleapis.com
NAME: cloudresourcemanager.googleapis.com
NAME: compute.googleapis.com
NAME: iam.googleapis.com
NAME: iamcredentials.googleapis.com
NAME: logging.googleapis.com
NAME: monitoring.googleapis.com
NAME: oslogin.googleapis.com
NAME: serviceusage.googleapis.com

see also https://github.com/GoogleCloudPlatform/pubsec-declarative-toolkit/issues/801 https://github.com/GoogleCloudPlatform/pubsec-declarative-toolkit/issues/807

One org obrien.industries is working with the log sinks the other newer org cloud-setup is not

the issue is likely missing IAM permissions on clean account cloud-setup.org - where an older org that even had an older hub-env is ok obrien.industries below

Update: same issue on 2nd org - looks like logging-sa needs roles/storage.admin

https://github.com/GoogleCloudPlatform/pubsec-declarative-toolkit/blob/main/solutions/core-landing-zone/lz-folder/audits/logging-project/cloud-storage-buckets.yaml#L20 missing permissions that are already set on https://github.com/GoogleCloudPlatform/pubsec-declarative-toolkit/blob/main/solutions/core-landing-zone/namespaces/logging.yaml#L82

Screenshot 2024-01-31 at 13 41 34

both have logging-sa as loggingadmin at the org level

logging-sa@kcc-cso-4380.iam.gserviceaccount.com | logging-sa | Logging Admin
-- | -- | --

and monitoring admin at the kcc project level

logging-sa@kcc-cso-4380.iam.gserviceaccount.com | logging-sa | Logging AdminMonitoring Admin
-- | -- | --

setters.yaml

apiVersion: v1
kind: ConfigMap
metadata: # kpt-merge: /setters
  name: setters
  annotations:
    config.kubernetes.io/local-config: "true"
    internal.kpt.dev/upstream-identifier: '|ConfigMap|default|setters'
data:
  org-id: "45..2144"
  lz-folder-id: "388..43"
  billing-id: "014...F85"
  management-project-id: "kcc-oi-7970"
  management-project-number: "729..84"
  management-namespace: config-control
  allowed-trusted-image-projects: |
    - "projects/cos-cloud"
  allowed-contact-domains: |
    - "@obrien.industries"
  allowed-policy-domain-members: |
    - "C03kdhrkc"
  allowed-vpc-peering: |
    - "under:organizations/459...44"
  logging-project-id: logging-project-oi0130
  security-log-bucket: security-log-bucket-oi0130
  platform-and-component-log-bucket: platform-and-component-log-bucket-oi0130
  retention-locking-policy: "false"
  retention-in-days: "1"
  dns-project-id: dns-project-oi0130
  dns-name: "obrien.industries."

single service IAM issue

michael@cloudshell:~/kcc-oi-20231206/kpt (kcc-oi-7970)$ kubectl get gcp -n logging
NAME                                                                                      AGE   READY   STATUS     STATUS AGE
logginglogbucket.logging.cnrm.cloud.google.com/platform-and-component-log-bucket-oi0130   17h   True    UpToDate   17h
logginglogbucket.logging.cnrm.cloud.google.com/security-log-bucket                        17h   True    UpToDate   17h

NAME                                                                                                AGE   READY   STATUS     STATUS AGE
logginglogsink.logging.cnrm.cloud.google.com/logging-project-oi0130-data-access-sink                17h   True    UpToDate   17h
logginglogsink.logging.cnrm.cloud.google.com/mgmt-project-cluster-platform-and-component-log-sink   17h   True    UpToDate   17h
logginglogsink.logging.cnrm.cloud.google.com/org-log-sink-data-access-logging-project-oi0130        17h   True    UpToDate   17h
logginglogsink.logging.cnrm.cloud.google.com/org-log-sink-security-logging-project-oi0130           17h   True    UpToDate   17h
logginglogsink.logging.cnrm.cloud.google.com/platform-and-component-services-infra-log-sink         17h   True    UpToDate   17h
logginglogsink.logging.cnrm.cloud.google.com/platform-and-component-services-log-sink               17h   True    UpToDate   17h

NAME                                                                      AGE   READY   STATUS     STATUS AGE
monitoringmonitoredproject.monitoring.cnrm.cloud.google.com/kcc-oi-7970   17h   True    UpToDate   17h

NAME                                                                       AGE   READY   STATUS         STATUS AGE
storagebucket.storage.cnrm.cloud.google.com/security-incident-log-bucket   17h   False   UpdateFailed   17h

michael@cloudshell:~/kcc-oi-20231206/kpt (kcc-oi-7970)$ kubectl describe storagebucket.storage.cnrm.cloud.google.com/security-incident-log-bucket -n logging
Name:         security-incident-log-bucket
Namespace:    logging
Labels:       <none>
Annotations:  cnrm.cloud.google.com/blueprint: kpt-pkg-fn-live
              cnrm.cloud.google.com/management-conflict-prevention-policy: none
              cnrm.cloud.google.com/project-id: logging-project-oi0130
              cnrm.cloud.google.com/state-into-spec: merge
              config.k8s.io/owning-inventory: 8bee7142b357086a1a649139f252ed0f59791b0e-1706652642396373649
              config.kubernetes.io/depends-on: resourcemanager.cnrm.cloud.google.com/namespaces/projects/Project/logging-project-oi0130
              internal.kpt.dev/upstream-identifier: storage.cnrm.cloud.google.com|StorageBucket|logging|security-incident-log-bucket
API Version:  storage.cnrm.cloud.google.com/v1beta1
Kind:         StorageBucket
Metadata:
  Creation Timestamp:  2024-01-30T22:17:21Z
  Generation:          1
  Resource Version:    45866
  UID:                 72c94615-c235-4c08-beda-7bb823eaea08
Spec:
  Autoclass:
    Enabled:                 true
  Location:                  northamerica-northeast1
  Public Access Prevention:  enforced
  Retention Policy:
    Is Locked:                  false
    Retention Period:           86400
  Uniform Bucket Level Access:  true
Status:
  Conditions:
    Last Transition Time:  2024-01-30T22:17:21Z
    Message:               Update call failed: error fetching live state: error reading underlying resource: summary: Error when reading or editing Storage Bucket "security-incident-log-bucket": googleapi: Error 403: logging-sa@kcc-oi-7970.iam.gserviceaccount.com does not have storage.buckets.get access to the Google Cloud Storage bucket. Permission 'storage.buckets.get' denied on resource (or it may not exist)., forbidden
    Reason:                UpdateFailed
    Status:                False
    Type:                  Ready
  Observed Generation:     1
Events:
  Type     Reason        Age                    From                      Message
  ----     ------        ----                   ----                      -------
  Warning  UpdateFailed  4m17s (x531 over 17h)  storagebucket-controller  Update call failed: error fetching live state: error reading underlying resource: summary: Error when reading or editing Storage Bucket "security-incident-log-bucket": googleapi: Error 403: logging-sa@kcc-oi-7970.iam.gserviceaccount.com does not have storage.buckets.get access to the Google Cloud Storage bucket. Permission 'storage.buckets.get' denied on resource (or it may not exist)., forbidden
michael@cloudshell:~/kcc-oi-20231206/kpt (kcc-oi-7970)$