cloudflare / cloudflare-gcp

Google Cloud Function to push json files from GC Storage to Big Query
Apache License 2.0
84 stars 49 forks source link

"No new logs" on all cloud function executions #89

Closed andreeleuterio closed 3 years ago

andreeleuterio commented 3 years ago

Hi folks, I tried to deploy the dashboard following these instructions. All executions of the cloud function run successfully but give a "No new logs" output. I don't see the configured dataset in BigQuery leading me to believe it wasn't created because the function didn't find new logs.

I can see logs coming in to the bucket. I configured the DIRECTORY env var to / because the date folders are in the root of the bucket. I also see the pub sub topic and cron jobs. Any ideas on how to further debug this?

cc @mohammadualam

shagamemnon commented 3 years ago

Hey @andreeleuterio - can you post a few more pieces of information here to help replicate? What I need is ..

andreeleuterio commented 3 years ago

@shagamemnon thank for the quick reply. Here's the information:

  1. These variables don't contain sensitive information so I'm good to post them all here:

    SCHEMA="schema-http.json"
    # The name of the subdirectory in your bucket used for Cloudflare Logpush logs,
    # for example, "logs/". If there is no subdirectory, use "/"
    DIRECTORY="/"
    BUCKET_NAME="cloudflare-sourcegraph-dot-com-logs"
    DATASET="cloudflare_logstream"
    TABLE="cloudflare_logs"
    REGION="us-central1"
    # You probably don't need to change these values:
    FN_NAME="cf-logs-to-bigquery"
    TOPIC_NAME="every_minute"

    which leads to the following commands:

    gcloud pubsub topics create every_minute
    gcloud scheduler jobs create pubsub cf_logs_cron --schedule="* * * * *" --topic=every_minute --message-body="60 seconds passed"
    gcloud functions deploy cf-logs-to-bigquery \
    --runtime nodejs12 \
    --trigger-topic every_minute \
    --region=us-central1 \
    --memory=1024MB \
    --entry-point=runLoadJob \
    --set-env-vars DATASET=cloudflare_logstream,TABLE=cloudflare_logs,SCHEMA=schema-http.json,BUCKET_NAME=cloudflare-sourcegraph-dot-com-logs,DIRECTORY=/
  2. https://storage.cloud.google.com/cloudflare-sourcegraph-dot-com-logs/20211118/20211118T000002Z_20211118T000032Z_1ba9cfac.log.gz

goaaron commented 3 years ago

@andreeleuterio @shagamemnon
I'm thinking it's a timestamp issue. The node function is outputting a prefix datetime that is 12 hours behind in my usage. So nothing in bucket.getFiles will be appear for 12 hours.

I've solved this problem by swapping: loadJobDeadline.toFormat(`yyyyMMdd'T'hhmm`) for loadJobDeadline.toFormat(`yyyyMMdd'T'HHmm`)

shagamemnon commented 3 years ago

@goaaron good catch. Thanks for pointing this out -- this is a change we'll need to make in master :)