apache / beam

Apache Beam is a unified programming model for Batch and Streaming data processing.
https://beam.apache.org/
Apache License 2.0
7.74k stars 4.21k forks source link

Dataflow Python SDK logging: step_id is always empty string #19711

Open damccorm opened 2 years ago

damccorm commented 2 years ago

Using the dataflow runner, log messages always show up in stackdriver with the step_id as the empty string, so filtering log messages for a step doesn't work.


resource: {
  labels: {
    job_id: "<job id>" 
    job_name: "<job name>" 
    project_id: "<project
id>" 
    region: "<region>" 
    step_id: "" 
  }
  type: "dataflow_step" 
}

Another user seems to have posted in the old github repo and appears to be seeing the same problem based on their output:

https://github.com/GoogleCloudPlatform/DataflowPythonSDK/issues/62

From what I can tell is only affecting streaming pipelines

Imported from Jira BEAM-7934. Original Jira may contain additional context. Reported by: jimpremise.

dakl commented 1 year ago

Any updated on this? It's an amazingly annoying bug.

hughack commented 1 year ago

Also keen on an update. Should this be here or is it more of a Google thing to fix?

BjornPrime commented 1 year ago

.take-issue

hughack commented 1 year ago

I've just realised the step_id is only missing for logs in the DoFn.setup. Logs in process do appear to have it set correctly for me. It is less of an issue but would be nice to have it in the setup as well.

pierresegonne commented 9 months ago

Any update on this? :)