apache / beam

Apache Beam is a unified programming model for Batch and Streaming data processing.
https://beam.apache.org/
Apache License 2.0
7.85k stars 4.25k forks source link

[Feature Request]: Docker images with both runtime and launch environments #32387

Open amardeep opened 2 months ago

amardeep commented 2 months ago

What would you like to happen?

Container environments describe a way to create custom docker images for runtime configuration, but the launch step example still needs access to the source file.

Could you please document a setup where the docker image contains both runtime and launch configuration so that a docker run invocation is enough to launch the job with the dependencies too.

Issue Priority

Priority: 2 (default / most feature requests should be filed as P2)

Issue Components

liferoad commented 2 months ago

Have you checked https://cloud.google.com/dataflow/docs/guides/using-custom-containers? or https://github.com/GoogleCloudPlatform/python-docs-samples/tree/main/dataflow/flex-templates/pipeline_with_dependencies

amardeep commented 2 months ago

I have. It works, but is not the most convenient as along side a docker image, it also requires maintaining a file in google cloud storage. So, it is not as simple as building a docker image and running it. It would have been nicer if beam supported a better way natively.