apache / beam

Apache Beam is a unified programming model for Batch and Streaming data processing.
https://beam.apache.org/
Apache License 2.0
7.77k stars 4.21k forks source link

[Bug] Beam errors out when using PortableRunner (Flink Runner) – `Cannot run program "docker"` #30663

Open harrymyburgh opened 5 months ago

harrymyburgh commented 5 months ago

What happened?

I am trying to deploy a Beam job (Python Beam) that runs on a PortableRunner (Flink Runner) in my Kubernetes cluster. I have not experienced issues prior with Beam using the Flink Runner. However, today I tried to set up Beam to be a consumer from Apache Kafka using ReadFromKafka from apache_beam.io.kafka.

My Flink Cluster is managed by the Apache Flink Kubernetes Operator.

My Beam jobs are managed by a Beam Flink Job Manager, which posts Beam jobs to the Flink master. The Job Manager uses the image apache/beam_flink1.16_job_server:2.54.0.

My Flink Task Managers each contain a sidecar for a Beam worker pool, which is spun up using the image apache/beam_python3.11_sdk:2.54.0 and the arg --worker_pool.

When I start my beam job, I get the following error on the job manager logs:

Caused by: java.io.IOException: Cannot run program "docker": error=2, No such file or directory

These are my Beam pipeline options:

--job_name=beam_example_pipeline
--runner=PortableRunner
--job_endpoint=beam-flink-job-server:8099
--artifact_endpoint=beam-flink-job-server:8098
--environment_type=EXTERNAL
--environment_config=localhost:50000
--parallelism=1
--streaming

Some resources I've found suggest that the Kafka transform has its own environment type which is set to (and overrides any environment you set?) --environment_type=DOCKER, which is what causes the issues. However, I could be wrong, so please say so if I am.

All of this taking place on a Kubernetes cluster, where, to my knowledge, Docker in Docker is not recommended. I do not want to use a PROCESS environment_type, I require EXTERNAL. How can I resolve this issue? Is this a bug with Beam?

Issue Priority

Priority: 2 (default / most bugs should be filed as P2)

Issue Components

felipecaputo commented 2 months ago

I'm currently facing the same problem myself, using a beam yaml pipeline, when it comes to run the SQL transformation it intents to run a docker with java in flink-main-container

WARN  org.apache.beam.runners.fnexecution.environment.DockerCommand [] - Unable to pull docker image apache/beam_java17_sdk:2.56.0, cause: Cannot run program "docker": error=2, No such file or directory
ERROR org.apache.flink.runtime.operators.BatchTask                 [] - Error in task code:  CHAIN MapPartition (MapPartition at [3]Sql/BeamCoGBKJoinRel_77/Join.Impl/CoGroup.ExpandCrossProduct/{extractKeylhs, CoGroupByKey}) -> FlatMap (FlatMap at ExtractOutput[0]) (1/1)
java.lang.Exception: The user defined 'open()' method caused an exception: java.io.IOException: Cannot run program "docker": error=2, No such file or directory
org.apache.flink.runtime.operators.BatchTask.run(BatchTask.java:508) ~[flink-dist-1.17.2.jar:1.17.2]
org.apache.flink.runtime.operators.BatchTask.invoke(BatchTask.java:357) ~[flink-dist-1.17.2.jar:1.17.2]
org.apache.flink.runtime.taskmanager.Task.runWithSystemExitMonitoring(Task.java:952) [flink-dist-1.17.2.jar:1.17.2]
org.apache.flink.runtime.taskmanager.Task.restoreAndInvoke(Task.java:931) [flink-dist-1.17.2.jar:1.17.2]
org.apache.flink.runtime.taskmanager.Task.doRun(Task.java:745) [flink-dist-1.17.2.jar:1.17.2]
org.apache.flink.runtime.taskmanager.Task.run(Task.java:562) [flink-dist-1.17.2.jar:1.17.2]
java.lang.Thread.run(Unknown Source) [?:?]

The tasks were executed in the python-harness container runned smoothly, but when it come to the Sql transformation, it failed

image image