cloudflare / cloudflare-gcp

Google Cloud Function to push json files from GC Storage to Big Query
Apache License 2.0
84 stars 49 forks source link

corrected data types #62

Open thecodeassassin opened 3 years ago

thecodeassassin commented 3 years ago

The reason for this PR is to correct the data types used in the schemas.

If you don't use the correct data types the following errors occur:

{"reason":"invalidQuery","location":"query","message":"Cannot return an invalid timestamp value of -8505438273863876608 microseconds relative to the Unix epoch. The range of valid timestamp values is [0001-01-1 00:00:00, 9999-12-31 23:59:59.999999]; error in writing field EdgeEndTimestamp"}

This PR fixes the incorrect data types

shagamemnon commented 3 years ago

Hey @thecodeassassin - thanks for putting this in :)

The default timestamp format for Logpush is actually RFC3339 (e.g. 2019-10-12T07:20:50.52Z):

Image of Logpush Defaults

... which is of type TIMESTAMP in BigQuery

It looks like you are using one of the Unix time options instead? Assuming I am reading this correctly, I'm afraid we'll need to find a different solution for your use case.

thecodeassassin commented 3 years ago

@shagamemnon thank you for the quick response.

We created the logpush job via terraform:

  for_each = {
    for dataset in local.datasets : dataset.id => dataset
  }

  zone_id = data.cloudflare_zones.zone_id_domain[each.value.domain].zones[0].id
  enabled = true

  name                = replace("${each.value.domain}-${each.value.id}", "/([\\.-_])+/", "-")
  destination_conf    = "gs://${google_storage_bucket.logs[each.key].name}"
  ownership_challenge = data.http.ownership_file[each.key].body
  dataset             = each.value.dataset
  logpull_options     = each.value.logpull_options
}

Also, on the docs it says int or string.

https://developers.cloudflare.com/logs/log-fields

Maybe it's only RFC3339 when you create it via the UI?

thecodeassassin commented 3 years ago

image of settings this is what terraform configured by default.

This is possible:

loggpull_options = "fields=RayID,ClientIP,EdgeStartTimestamp&timestamps=rfc3339"

But by default it seems that if you create the logpush job via the API, the timestamps have UNIX NANO format. I can't find anything in the terraform provider to suggest otherwise.