apache / beam

Apache Beam is a unified programming model for Batch and Streaming data processing.
https://beam.apache.org/
Apache License 2.0
7.7k stars 4.2k forks source link

Request payload size exceeds the limit: 10485760 bytes #18660

Open kennknowles opened 2 years ago

kennknowles commented 2 years ago

I wrote a python dataflow job to read data from Bigquery and do some transform and save the result as bq table..

I tested with 8 days data it works fine - when I scaled to 180 days I’m getting the below error

"message": "Request payload size exceeds the limit: 10485760 bytes.",

"error": {
"code": 400,
"message": "Request payload size exceeds the limit: 10485760 bytes.",
"status": "INVALID_ARGUMENT"
}

In short, this is what I’m doing 1 - Reading data from bigquery table using beam.io.BigQuerySource 2 - Partitioning each days using beam.Partition 3- Applying transforms each partition and combining some output P-Collections. 4- After the transforms, the results are saved to a biqquery date partitioned table.

Imported from Jira BEAM-3455. Original Jira may contain additional context. Reported by: unais.

baeminbo commented 1 year ago

I read the discussion in the mailing list. I believe we don't have 10 MB request size limit in Dataflow. IIUC, we have only size limit for single element value in Streaming Engine (80MB), and CommitRequest limit (2GB?). See https://cloud.google.com/dataflow/quotas.

In the meantime, Pub/Sub still has 10MB limit for "Publish request" and "Message Size": https://cloud.google.com/pubsub/quotas#resource_limits.

Can we ask the reporter to confirm if this issue still happens with the latest Beam version?