SumoLogic / terraform-provider-sumologic

Terraform provider for Sumo Logic
https://www.terraform.io/docs/providers/sumologic/
Mozilla Public License 2.0
40 stars 57 forks source link

using cutoff_relative_time in sumologic_s3_source always causes resource replacement and re-ingest #561

Open hashi-breed opened 1 year ago

hashi-breed commented 1 year ago

we have a number of high-volume sumologic_s3_source datasources configured where SNS notification is not possible (we don't control the bucket, vendor doesn't support SNS signaling, etc etc).

We initially used static cutoff_timestamp properties on these s3 pollers to limit the amount of potential re-ingest at deployment time. As these resources age, the static timestamps can go well past our needs and/or configured partition retention limits, causing complete bucket re-ingest and throttling when poller-side state is lost or when a resource is redeployed.

We switched to cutoff_relative_time to set a max RPO (usually -24h), however, now, on every redeploy, it seems that cutoff_relative_time is materialized into a concrete cutoff_timestamp (at least when viewed from https://api.sumologic.com/api/v1/collectors/MMMM/sources/NNNN), and the cutoff_relative_time appears empty in the plan (""), which taints the resource, forces replacement, and since it's now a new s3 poller, causes re-ingest of all keys back to the freshly-calculated cutoffTimestamp.

Example config:

resource "sumologic_collector" "not_our_s3_buckets" {
  name        = "s3-buckets"
  description = "External Bucket Collector"
  timezone    = "Etc/UTC"
}

resource "sumologic_s3_source" "s3_poller_source" {
  name                 = "ext-s3-poller"
  description          = "Logs that must be polled via S3"
  category             = "some/unfortunate/category"
  content_type         = "AwsS3Bucket"
  scan_interval        = 300000 # 300 * 1000 ms = 5m
  cutoff_relative_time = "-1d"
  paused               = false
  collector_id         = sumologic_collector.not_our_s3_buckets.id

  authentication {
    type       = "S3BucketAuthentication"
    access_key = var.not_our_s3_access_key
    secret_key = var.not_our_s3_secret_key
  }

  path {
    type              = "S3BucketPathExpression"
    bucket_name       = "some-bucket-with-lots-of-keys-and-large-objects"
    path_expression   = "data/*gz"
  }
}

on plan:

...
sumologic_s3_source.s3_poller_source:

~ cutoff_relative_time "" -> "-1d" # Forces Replacement
~ cutoff_timestamp  1690836905709 -> 0 
...

I'd rather not add lifecycle/ignore_changes blocks to these resources if possible.

dlinsumo commented 1 year ago

Internal ticket SUMO-227399

wjakelee commented 1 year ago

Response from internal team: “cutoff_relative_time is a non-modifiable field, therefore changing field that forces to tear down the resource and create a new one. This is expected behavior. ref: https://help.sumologic.com/docs/send-data/use-json-configure-sources/#common-parameters-for-log-source-types If you search for cutoffRelativeTime on the page above, you can see the description of the field. The better approach is to use cutoff_timestamp parameter and keep updating it during redeploys.”

hashi-breed commented 1 year ago

i understand the technical constraints that require this as the answer, but the implications are counter-intuitive and the ergonomics add further complications. a minimal implementation to keep 3 RPO offsets that self-adjust to the time of plan creation would look like this (nb: plantimestamp only works in 1.5)

terraform {
  required_version = "~>1.5"
  required_providers {
    time = {
      source  = "hashicorp/time"
      version = "~> 0.9.1"
    }
  }
}

locals {
  base_time      = plantimestamp()
}

resource "time_offset" "minus_1h" {
  base_rfc3339 = local.base_time
  offset_hours = -1
}

resource "time_offset" "minus_24h" {
  base_rfc3339 = local.base_time
  offset_hours = -24
}

resource "time_offset" "minus_168h" {
  base_rfc3339 = local.base_time
  offset_hours = -168
}

locals {
  timestamp_offset  = {
    "-1h"   = (time_offset.minus_1h.unix   * 1000)
    "-1d"   = (time_offset.minus_24h.unix  * 1000)
    "-7d"   = (time_offset.minus_168h.unix * 1000)
  }
}

output "timestamp_offset" {
  value = local.timestamp_offset
}

that would indeed let you use cutoff_timestamp = local.timestamp_offset["-1h"] in the datasource config, but from my reading, that would only adjust on every deploy. if you wanted to keep that same backstop RPO updated over time, you'd have to constantly re-deploy.

e.g. if you wanted to limit accidental re-ingest of polled s3 to no more than 1h of duplicate events, you'd have to re-deploy every hour or so to keep ratcheting up the cutoff_timestamp.

I'm still in the process of converting our s3 pollers over to this style of config, but will update here when complete.