GoogleCloudPlatform / workflows-demos

A collection of Workflows samples for various use cases
Apache License 2.0
131 stars 36 forks source link

Create a sample for event-triggered Cloud Run Jobs where event payload is stored in GCS #100

Closed steren closed 1 year ago

steren commented 1 year ago

This is a different request from https://github.com/GoogleCloudPlatform/workflows-demos/issues/99

In https://github.com/GoogleCloudPlatform/workflows-demos/issues/99, we want the event payload to be passed as env var to the Job execution.

Some customers will need the events to be encrypted with their own encryption keys, which is not something the env vars support. We should offer another sample that does these:

meteatamel commented 1 year ago

@steren, I need more details to understand the sample request.

  1. We're talking about Eventarc->Workflows->Cloud Run jobs. What events do we want to route from Eventarc to Workflows? GCS events, Pub/Sub, or does this not matter?
  2. When Workflows receives the event, we want the whole event saved to a bucket or just the data field of the event? So something like this? https://github.com/GoogleCloudPlatform/workflows-demos/blob/master/workflows-eventarc-integration/event-payload-storer/event-payload-storer.yaml
  3. Then Workflows runs a Cloud Run job passing the created event GCS file as a container argument? Why not as an env var? Is it because we don't want to show event GCS file in the env vars?
  4. What does the Cloud Run job do with the event GCS file?

I guess I'm having a hard time understanding the point of this sample. What's the point of redirecting event to a GCS bucket?

steren commented 1 year ago

The goal is to provide a variant of https://github.com/GoogleCloudPlatform/workflows-demos/issues/99, where instead of storing the event attributes as execution overrides, they are stored temporarily as a file in a GCS bucket and then the file is passed as input to the job

You can use whatever event type, but best is to not use GCS event, in order to not introduce confusion with the fact that we'll also use GCS to store event data.

The use case is that some customers do not want event data to be captured as env vars because this wouldn't be encrypted with CMEK, so instead we can leverage GCS to encrypt event data with CMEK

The Cloud Run job should read that GCS file and do something with the event payload.

meteatamel commented 1 year ago

@steren, The thing is in #99, we're basically saying to the Cloud Run jobs: "Process this file in this bucket" via INPUT_BUCKET and INPUT_FILE env variable overrides.

In this sample, let's assume I'm using Pub/Sub events. Eventarc takes that event, passes to Workflows, Workflows extracts the payload and saves to a file in a GCS bucket and tells Cloud Run jobs to process that file via env variables again. How's this any different from #99 and what value-add it has other than showing how to save files from Workflows?

steren commented 1 year ago

Some customers do not want event payload to be in cloud run env vars, which are not encrypted with CMEK. Thus the need to store event payload in GCS (where customers can enable CMEK) in this second sample.

This is supporting the broader "event triggered jobs" strategy. I shared with you a doc about this internally.

This second sample shouldn't be using a GCS event, so that it doesn't blur the lines. Something like Audit log would be better.

meteatamel commented 1 year ago

Ok but the event payload is not in the cloud run env vars in #99. Instead, the GCS bucket and file is in the cloud run env vars and the Cloud Run job simply reads that file from that bucket. That's why I'm failing to see how #100 will be any different than #99.

Let's assume we have an AuditLog event. This will be captured by Eventarc and sent to Workflows. Then, Workflows will extract and store the payload of this AuditLog event to a GCS file in a bucket and call Cloud Run jobs to process that file via env vars, right? How is this different from #99 that also processes files in a bucket?

The only difference is the extra logic to take the payload of the AuditLog event and save it to the bucket in Workflows. So, we're basically showing how to save a save an event payload to a GCS bucket. I don't mind creating this sample if this is all we want to show but it didn't feel like it's different enough.

steren commented 1 year ago

Despite an internal doc, an internal chat, and explanations in this issue, it seems I am not able to convince you. I do not have more time to allocate to the topic. So I am closing the request.

steren commented 1 year ago

we're basically showing how to save a save an event payload to a GCS bucket.

I shared it multiple times, this is critical to customer's ability to encrypt event data with their own keys.