Closed dummy-work-account closed 1 year ago
When I comment out the windowinto line the error goes away but the pipeline still doesn't function as expected -- which might be an issue with my custom DoFn
with beam.Pipeline(options=options) as pipeline:
messages = (
pipeline
| f"Read from input topic {subscription_id}" >>
beam.io.ReadFromPubSub(subscription=subscription_id,
with_attributes=False)
| f"Deserialize Avro {subscription_id}" >> beam.ParDo(
ConfluentAvroReader(schema_registry_conf)).with_outputs(
"record", "error"))
records = messages["record"]
errors = messages["error"]
(records
| 'Aggregate msgs in fixed window' >> beam.WindowInto(beam.window.FixedWindows(15))
| 'Send hardcoded value to datadog' >> beam.ParDo(SendToDatadog())
| 'Print results' >> beam.Map(print)
)
The error happens in the SDK internals and is rather strange. It sounds as though a runner sends a malformed request to the SDK. I would suggest you try again and if the issue still persist provide a minimal pipeline that reproduces it, that we could try out, or work with Dataflow customer support.
Thank you for the feedback! The issue has persisted -- I'm working on a minimal pipeline that can reproduce the error now
Hi, any news about the repro? Thanks!
Thanks for the followup! I ended up going a different route, by removing the datadog module from the pipeline and setting up a separate container to relay messages from pubsub to datadog. It feels like beam/dataflow is structured around data transformation -- trying to coerce it to do external API calls seems like a dumb idea in hindsight (especially on an unpeered-vpc)
What happened?
I'm trying to send custom metrics to datadog using a DoFn but the python sdk test harness is failing with an error that I don't know how to interpret:
I'm running the streaming pipeline as a flex template in gcp dataflow, using the python requests module for POST calls
Issue Priority
Priority: 3 (minor)
Issue Components