apache / datafusion-ballista

Apache DataFusion Ballista Distributed Query Engine
https://datafusion.apache.org/ballista
Apache License 2.0
1.46k stars 185 forks source link

Error, message length too large: found 7666438 bytes, the limit is: 4194304 bytes #773

Closed andygrove closed 9 months ago

andygrove commented 1 year ago

Describe the bug

I tried running some benchmarks, but some queries fail with this error:

2023-05-14T16:00:52.679602Z  WARN tokio-runtime-worker ThreadId(47) ballista_executor::execution_loop: Executor poll work loop failed. If this continues to happen the Scheduler might be marked as dead. Error: status: OutOfRange, message: "Error, message length too large: found 7666438 bytes, the limit is: 4194304 bytes", details: [], metadata: MetadataMap { headers: {"content-type": "application/grpc", "date": "Sun, 14 May 2023 16:00:52 GMT"} }    

To Reproduce

Start cluster:

./target/release/ballista-scheduler
./target/release/ballista-executor -c 24

Run TPC-H benchmarks

Expected behavior Should not fail

Additional context

yahoNanJing commented 1 year ago

Hi @andygrove, we also meet the same issue. I will propose a PR to add a config to make the maximum decoded message size configurable for temporary fix.

andygrove commented 9 months ago

I am still running into this error with the latest code.

2023-12-11T14:31:18.347839Z  WARN          task_runner ThreadId(82) ballista_executor::cpu_bound_executor: Spawned task output ignored: receiver dropped    
2023-12-11T14:31:18.484649Z  WARN tokio-runtime-worker ThreadId(45) ballista_executor::execution_loop: Executor poll work loop failed. If this continues to happen the Scheduler might be marked as dead. Error: status: OutOfRange, message: "Error, message length too large: found 7700152 bytes, the limit is: 4194304 bytes", details: [], metadata: MetadataMap { headers: {"content-type": "application/grpc", "date": "Mon, 11 Dec 2023 14:31:18 GMT"} }    

I am using the default --grpc-server-max-decoding-message-size size of 16 MB, but the limit still appears to be 4 MB.

andygrove commented 9 months ago

We currently set the decoding max size but not the encoding max size, so perhaps that is the issue. I will test this.

Dandandan commented 9 months ago

We've hit some other errors related to max sizes at our end (Coralogix), we reduced those errors by:

Dandandan commented 9 months ago

Some other things we did:

andygrove commented 9 months ago

I confirmed that setting the max encoding size resolves the issue for me.

andygrove commented 9 months ago

We set max encode/decode message size when creating the gRPC servers, but not for the clients, so I ran into this again.