Closed obriensystems closed 7 months ago
A good example of bug chasing for 30 min (nothing related to actual coding dev or devops) - just a typo Hint: was not an issue with any of the actual terraform yaml code
fixed with a single file change
Applying log sink on prod project to start - with a target of existing bucket - will switch to pubsub
preview logs from the router pane
Step #3 - "tf plan": Terraform will perform the following actions:
Step #3 - "tf plan":
Step #3 - "tf plan": # module.project-level-log-sink.google_logging_project_sink.my-sink will be created
Step #3 - "tf plan": + resource "google_logging_project_sink" "my-sink" {
Step #3 - "tf plan": + destination = "logging.googleapis.com/projects/tzpe-tlz-audittlz-tlz/locations/northamerica-northeast1/buckets/20231015tlz"
Step #3 - "tf plan": + filter = "resource.type = gce_instance AND severity >= INFO"
Step #3 - "tf plan": + id = (known after apply)
Step #3 - "tf plan": + name = "20231015-sink"
Step #3 - "tf plan": + project = "tzpe-tlz-tlzprod-host4"
Step #3 - "tf plan": + unique_writer_identity = true
Step #3 - "tf plan": + writer_identity = (known after apply)
Step #3 - "tf plan":
Step #3 - "tf plan": + bigquery_options {
Step #3 - "tf plan": + use_partitioned_tables = (known after apply)
Step #3 - "tf plan": }
Step #3 - "tf plan": }
Step #3 - "tf plan":
Step #3 - "tf plan": # module.service_accounts.data.template_file.keys["sa"] will be read during apply
Step #3 - "tf plan": # (config refers to values not yet known)
Step #3 - "tf plan": <= data "template_file" "keys" {
Step #3 - "tf plan": ~ id = "e3b0c44298fc1c149afbf4c8996fb92427ae41e4649b934ca495991b7852b855" -> (known after apply)
Step #3 - "tf plan": + rendered = (known after apply)
Step #3 - "tf plan": # (2 unchanged attributes hidden)
Step #3 - "tf plan": }
Step #3 - "tf plan":
Step #3 - "tf plan": # module.net-host-prj.module.project.google_project.project will be updated in-place
Step #3 - "tf plan": ~ resource "google_project" "project" {
Step #3 - "tf plan": id = "projects/tzpe-tlz-tlzprod-host4"
Step #3 - "tf plan": ~ labels = {
Step #3 - "tf plan": - "date_modified" = "2023-10-16"
Step #3 - "tf plan": } -> (known after apply)
Step #3 - "tf plan": name = "TzPe-tlz-tlzprod-host4"
Step #3 - "tf plan": # (5 unchanged attributes hidden)
Step #3 - "tf plan": }
Step #3 - "tf plan":
Step #3 - "tf plan": # module.net-host-prj.module.network["tlzprod-svpc"].module.subnets["prsubnet02"].google_compute_subnetwork.subnetwork will be updated in-place
Step #3 - "tf plan": ~ resource "google_compute_subnetwork" "subnetwork" {
Step #3 - "tf plan": id = "projects/tzpe-tlz-tlzprod-host4/regions/northamerica-northeast1/subnetworks/tzpecnr-prsubnet02-host4-snet"
Step #3 - "tf plan": name = "tzpecnr-prsubnet02-host4-snet"
Step #3 - "tf plan": # (13 unchanged attributes hidden)
Step #3 - "tf plan":
Step #3 - "tf plan": ~ log_config {
Step #3 - "tf plan": - metadata = "EXCLUDE_ALL_METADATA" -> null
Step #3 - "tf plan": # (4 unchanged attributes hidden)
Step #3 - "tf plan": }
Step #3 - "tf plan": }
Step #3 - "tf plan":
Step #3 - "tf plan": Plan: 1 to add, 2 to change, 0 to destroy.
module.service_accounts.data.template_file.keys["sa"]: Read complete after 0s [id=e3b0c44298fc1c149afbf4c8996fb92427ae41e4649b934ca495991b7852b855]
module.net-host-prj.module.network["tlzprod-svpc"].module.subnets["prsubnet02"].google_compute_subnetwork.subnetwork: Modifying... [id=projects/tzpe-tlz-tlzprod-host4/regions/northamerica-northeast1/subnetworks/tzpecnr-prsubnet02-host4-snet]
module.net-host-prj.module.network["tlzprod-svpc"].module.subnets["prsubnet02"].google_compute_subnetwork.subnetwork: Modifications complete after 1s [id=projects/tzpe-tlz-tlzprod-host4/regions/northamerica-northeast1/subnetworks/tzpecnr-prsubnet02-host4-snet]
module.project-level-log-sink.google_logging_project_sink.my-sink: Creating...
module.project-level-log-sink.google_logging_project_sink.my-sink: Creation complete after 2s [id=projects/tzpe-tlz-tlzprod-host4/sinks/20231015-sink]
fixed hardcoded audit project - to pick up prod shared VPC project, removed gce specific log filter, moved bucket to prod from audit. Pending is code to create the bucket
https://registry.terraform.io/providers/hashicorp/google/latest/docs/resources/storage_bucket
we can target splunk
Logging bucket vs cloud storage bucket https://registry.terraform.io/providers/hashicorp/google/latest/docs/resources/logging_project_bucket_config
resource "google_logging_project_bucket_config" "prod-log-sink-bucket" {
project = var.project_id
location = var.region1
retention_days = 30
#enable_analytics = true # N/A yet
bucket_id = var.bucket_name
}
Step #4 - "tf apply": module.project-level-log-sink.google_logging_project_bucket_config.analytics-enabled-bucket: Creating...
Step #4 - "tf apply": module.project-level-log-sink.google_logging_project_bucket_config.analytics-enabled-bucket: Creation complete after 1s [id=projects/tzpe-tlz-tlzprod-host4/locations/northamerica-northeast1/buckets/20231015-prod-sink]
redeploying with renamed sinks, buckets Note: log storage bucket does not get removed - if the variable name is also changed https://console.cloud.google.com/logs/storage?project=tzpe-tlz-tlzprod-host4&supportedpurview=project
Create GCS bucket for 2nd GCS log sink destination - remove filter https://registry.terraform.io/providers/hashicorp/google/latest/docs/resources/storage_bucket
logs take up to an hour to show - do some VM start/stop to generate logs first
preview logs
PR 1 of 2-3
root_@cloudshell:~/lz-tls/_lz2/_upsource/pbmm-on-gcp-onboarding (lz-tls)$ git status
On branch 318-log-sink-alerting
Your branch is up to date with 'origin/318-log-sink-alerting'.
Changes to be committed:
(use "git restore --staged <file>..." to unstage)
modified: environments/prod/main.tf
new file: environments/prod/prod-logging.auto.tfvars
modified: environments/prod/variables.tf
new file: modules/23-logging/main.tf
new file: modules/23-logging/outputs.tf
new file: modules/23-logging/variables.tf
new file: modules/24-gcs-bucket/main.tf
new file: modules/24-gcs-bucket/outputs.tf
new file: modules/24-gcs-bucket/variables.tf
root_@cloudshell:~/lz-tls/_lz2/_upsource/pbmm-on-gcp-onboarding (lz-tls)$ git push origin 318-log-sink-alerting
Username for 'https://github.com': obriensystems
Password for 'https://obriensystems@github.com':
Enumerating objects: 20, done.
Counting objects: 100% (20/20), done.
Delta compression using up to 4 threads
Compressing objects: 100% (14/14), done.
Writing objects: 100% (14/14), 3.11 KiB | 1.55 MiB/s, done.
Total 14 (delta 9), reused 0 (delta 0), pack-reused 0
remote: Resolving deltas: 100% (9/9), completed with 6 local objects.
To https://github.com/GoogleCloudPlatform/pbmm-on-gcp-onboarding.git
2caef98..c6edbf6 318-log-sink-alerting -> 318-log-sink-alerting
https://github.com/GoogleCloudPlatform/pbmm-on-gcp-onboarding/pull/334
Buckets coming up
Review KCC version of our logging project and it's sinks (61GB) https://github.com/GoogleCloudPlatform/pubsec-declarative-toolkit/blob/main/solutions/client-landing-zone/logging-project/cloud-logging-bucket.yaml
https://github.com/GoogleCloudPlatform/pubsec-declarative-toolkit/issues/634 https://github.com/GoogleCloudPlatform/pubsec-declarative-toolkit/issues/446
terraform.landing.systems buckets via terraform are up
storage bucket and logging bucket created for 2 routers in bigquery-ol at the org scope 20231106:1630
1705
GCS entries up later
sink details
comparing
permissions
busted org
working org
Reviewing the results of https://github.com/GoogleCloudPlatform/pbmm-on-gcp-onboarding/pull/333/files#diff-2f2b3d2889b990647e43f24f860b0d08898940ea273d175ae257fed4339431f7
Noticed that the GCS sink is routing to a log router instead of a GCS storage bucket - hence why log storage is filled but the GCS bucket is not
log router
prod-log-sink
prod-log-gcs-sink
log storage
prod-sink-gcs
sink details - GCS
Sink details - log storage
triage switch
gcs_bucket_name = "20231015-prod-sink-gcs"
resource "google_logging_project_sink" "prod-log-sink-to-gcs-bucket" {
name = var.gcs_sink_name
project = var.project_id
# Can export to pubsub, cloud storage, bigquery, log bucket, or another project
#destination = "pubsub.googleapis.com/projects/my-project/topics/instance-activity"
#destination = "logging.googleapis.com/projects/${var.project_id}/locations/${var.region1}/buckets/${var.gcs_bucket_name}"
destination = "storage.googleapis.com/${var.gcs_bucket_name}"
resource "google_logging_project_sink" "prod-log-sink-to-log-bucket" {
name = var.log_sink_name
project = var.project_id
# Can export to pubsub, cloud storage, bigquery, log bucket, or another project
#destination = "pubsub.googleapis.com/projects/my-project/topics/instance-activity"
destination = "logging.googleapis.com/projects/${var.project_id}/locations/${var.region1}/buckets/${var.log_bucket_name}"
#destination = "storage.googleapis.com/[GCS_BUCKET]"
#destination = "bigquery.googleapis.com/projects/[PROJECT_ID]/datasets/[DATASET]"
#destination = "pubsub.googleapis.com/projects/[PROJECT_ID]/topics/[TOPIC_ID]"
#destination = "logging.googleapis.com/projects/[PROJECT_ID]/locations/global/buckets/[BUCKET_ID]"
#destination = "logging.googleapis.com/projects/[PROJECT_ID]"
# Example: Log all WARN or higher severity messages relating to instances
#filter = "resource.type = gce_instance AND severity >= INFO"
# filter only by log severity - remember filter is optional
filter = "severity >= INFO"
# Use a unique writer (creates a unique service account used for writing)
unique_writer_identity = true
}
and turn off filter
Before that - found the issue - IAM permissions on the SA
{
errorGroups: [1]
insertId: "1uvlr6ebn7"
labels: {7}
logName: "projects/tzpe-tlz-tlzprod-host4/logs/logging.googleapis.com%2Fsink_error"
receiveTimestamp: "2023-11-02T16:10:31.566259282Z"
resource: {2}
severity: "ERROR"
textPayload: "Cloud Logging sink configuration error in tzpe-tlz-tlzprod-host4, sink 20231015-prod-gcs-sink: bucket_permission_denied ()"
timestamp: "2023-11-02T16:10:30.582419156Z"
see https://cloud.google.com/logging/docs/export/configure_export_v2#gcloud_3
missing
service-951469276805@gcp-sa-logging.iam.gserviceaccount.com | Cloud Logging Service Account for Project 951469276805 | Storage Legacy Bucket Owner |
---|
get the cloud logging service account SA by determining the project number for the prod4 project
root_@cloudshell:~/lz-tls/_lz2/pbmm-on-gcp-onboarding (lz-tls)$ PROJECT_ID=tzpe-tlz-tlzprod-host4
root_@cloudshell:~/lz-tls/_lz2/pbmm-on-gcp-onboarding (lz-tls)$ KCC_PROJECT_NUMBER=$(gcloud projects list --filter="${PROJECT_ID}" '--format=value(PROJECT_NUMBER)')
root_@cloudshell:~/lz-tls/_lz2/pbmm-on-gcp-onboarding (lz-tls)$ echo $KCC_PROJECT_NUMBER
604049845861
How does the cloud logging service account get auto assigned to the GCS bucket? https://cloud.google.com/logging/docs/buckets#which_service_accounts_are_routing_logs_to_my_bucket "Logs Bucket Writer" in roles off IAM permissions - none there on either org - even though "ol" has the SA
I think we need "Storage Legacy Bucket Owner"
raising separate https://github.com/GoogleCloudPlatform/pbmm-on-gcp-onboarding/issues/337
SA is
root_@cloudshell:~/lz-tls/_lz2/pbmm-on-gcp-onboarding (lz-tls)$ gcloud logging settings describe --project=$PROJECT_ID
kmsServiceAccountId: cmek-p604049845861@gcp-sa-logging.iam.gserviceaccount.com
loggingServiceAccountId: service-604049845861@gcp-sa-logging.iam.gserviceaccount.com
name: projects/tzpe-tlz-tlzprod-host4/settings
and
root_@cloudshell:~/lz-tls/_lz2/pbmm-on-gcp-onboarding (lz-tls)$ gcloud logging settings describe --project=$PROJECT_ID
kmsServiceAccountId: cmek-p604049845861@gcp-sa-logging.iam.gserviceaccount.com
loggingServiceAccountId: service-604049845861@gcp-sa-logging.iam.gserviceaccount.com
name: projects/tzpe-tlz-tlzprod-host4/settings
Check existing config flag set to true before we swap in the service account - see if it kicks in the default sa around https://registry.terraform.io/providers/hashicorp/google/latest/docs/resources/logging_project_sink
# Use a unique writer (creates a unique service account used for writing)
unique_writer_identity = true
change
root_@cloudshell:~/lz-tls/_lz2/pbmm-on-gcp-onboarding (lz-tls)$ git diff
diff --git a/modules/23-logging/main.tf b/modules/23-logging/main.tf
index 943f4b0..8472e25 100644
--- a/modules/23-logging/main.tf
+++ b/modules/23-logging/main.tf
@@ -69,7 +69,7 @@ resource "google_logging_project_sink" "prod-log-sink-to-gcs-bucket" {
#filter = "severity >= INFO"
# Use a unique writer (creates a unique service account used for writing)
- unique_writer_identity = true
+ #unique_writer_identity = true
results - new SA serviceAccount:service-604049845861@gcp-sa-logging.iam.gserviceaccount.com
Step #3 - "tf plan": Terraform will perform the following actions:
Step #3 - "tf plan":
Step #3 - "tf plan": # module.project-level-log-sink.google_logging_project_sink.prod-log-sink-to-gcs-bucket must be replaced
Step #3 - "tf plan": -/+ resource "google_logging_project_sink" "prod-log-sink-to-gcs-bucket" {
Step #3 - "tf plan": - disabled = false -> null
Step #3 - "tf plan": ~ id = "projects/tzpe-tlz-tlzprod-host4/sinks/20231015-prod-gcs-sink" -> (known after apply)
Step #3 - "tf plan": name = "20231015-prod-gcs-sink"
Step #3 - "tf plan": ~ unique_writer_identity = true -> false # forces replacement
Step #3 - "tf plan": ~ writer_identity = "serviceAccount:service-604049845861@gcp-sa-logging.iam.gserviceaccount.com" -> (known after apply)
Step #3 - "tf plan": # (2 unchanged attributes hidden)
Step #3 - "tf plan":
Step #3 - "tf plan": + bigquery_options {
Step #3 - "tf plan": + use_partitioned_tables = (known after apply)
Step #3 - "tf plan": }
Step #3 - "tf plan": }
Step #3 - "tf plan":
Step #3 - "tf plan": # module.service_accounts.data.template_file.keys["sa"] will be read during apply
Step #3 - "tf plan": # (config refers to values not yet known)
Step #3 - "tf plan": <= data "template_file" "keys" {
Step #3 - "tf plan": ~ id = "e3b0c44298fc1c149afbf4c8996fb92427ae41e4649b934ca495991b7852b855" -> (known after apply)
Step #3 - "tf plan": + rendered = (known after apply)
Step #3 - "tf plan": # (2 unchanged attributes hidden)
Step #3 - "tf plan": }
Step #3 - "tf plan":
will need to wait a couple hours
1320 - 2h
20240406: Closing issue during retrofit/rebase of this TEF V1 based/modified repo to TEF V4 standards This issue may participate in the LZ refactor after rebase Query on all issues related to the older V1 version via the tag https://github.com/GoogleCloudPlatform/pbmm-on-gcp-onboarding/labels/2024-pre-tef-v4
Use Case is around routing logs to 3rd party software on prem
add centralized logging project bucket as logging sink targets from workloads https://github.com/GoogleCloudPlatform/pubsec-declarative-toolkit/blob/main/solutions/core-landing-zone/lz-folder/audits/logging-project/cloud-logging-buckets.yaml#L41
follow guidance below https://cloud.google.com/logging/docs/routing/overview#sinks https://cloud.google.com/logging/docs/export/aggregated_sinks https://cloud.google.com/logging/docs/export/pubsub https://cloud.google.com/storage/docs/pubsub-notifications https://cloud.google.com/architecture/monitoring
Splunk Log Sink and Filtering https://cloud.google.com/architecture/stream-logs-from-google-cloud-to-splunk https://registry.terraform.io/modules/terraform-google-modules/log-export/google/latest/examples/splunk-sink
https://cloud.google.com/architecture/security-foundations/logging-monitoring
20231015 work starting on older branch v20230917 adding an alternate org sink until the main branch is stabilized https://github.com/GoogleCloudPlatform/pbmm-on-gcp-onboarding/releases/tag/v20230917 see #332
Branching off 332 branch for PRs related to log sinks
use
diff between 332 branch https://github.com/GoogleCloudPlatform/pbmm-on-gcp-onboarding/pull/334/files
diff between main branch https://github.com/GoogleCloudPlatform/pbmm-on-gcp-onboarding/pull/333/files
Architecture
Current state
organization log sink is there at the org root under the defined variable https://github.com/GoogleCloudPlatform/pbmm-on-gcp-onboarding/blob/main/environments/common/common.auto.tfvars#L73 sink_name
the logs are warning level only
see https://github.com/GoogleCloudPlatform/pbmm-on-gcp-onboarding/blob/main/docs/google-cloud-security-controls.md#cloud-logging---logs-router
the logs route to the bucket storage.googleapis.com/tzpeaudittlz with a retention of 1 sec
under coldline storage
for example authenticated URL https://storage.cloud.google.com/tzpeaudittlz/cloudaudit.googleapis.com/activity/2023/01/31/17%3A00%3A00_17%3A59%3A59_S0.json
or a later PSC endpoint https://storage.cloud.google.com/tzpeaudittlz/cloudaudit.googleapis.com/activity/2023/09/06/14%3A00%3A00_14%3A59%3A59_S1.json?_ga=2.86505105.-885096055.1674837219
Future state
Org sink https://registry.terraform.io/providers/hashicorp/google/latest/docs/resources/logging_organization_sink
Project sink https://registry.terraform.io/providers/hashicorp/google/latest/docs/resources/logging_project_sink