Open lostluck opened 2 years ago
Open Question: What's the correct behavior here for Cross Language containers? If it's the "standard" path for a beam release container, but running on dataflow, should we make the substitution for them as well. I'd assume yes.
Answer from @chamikaramj : for Xlang, the host SDK needs to do the translation of paths. Expansion services have no idea if a request is to be used on Dataflow or not (or whatever runner is executing their reponses).
So the Go Dataflow runner package will need to simply take the default beam image paths, and make the substitutions itself uniformly for all environments in the pipeline.
We really need to get away from having dataflow-specific containers for anything.
For Go, the containers are not Dataflow specific. They are the same release containers, cloned to GCR.
It makes sense for reduced user costs to keep things in region/network so there's less likelyhood of ingress costs for them, and possibly reduced startup latency.
They are the same containers for Java/Python as well. We just copy the released Beam SDK harness containers to GCR (for improving reliability/latency I believe).
What would you like to happen?
Java and Python SDKs use cloned external images from Dockerhub cached in GCR by default instead.
https://github.com/apache/beam/search?q=v1beta3%2F
Containers live here: https://console.cloud.google.com/gcr/images/cloud-dataflow/global/v1beta3
The Go SDK should do the same, to reduce startup time. Container paths would be like
gcr.io/cloud-dataflow/v1beta3/beam_go_sdk:2.37.0
and similar for each version.The Go SDK container is significantly smaller than Java and Python's due to the static compilation nature of Go, so this likely reduced issues requiring the proximity caching. However, for parity this would be useful.
~~Images are also hosted on Google Artifact Registry on a Regional basis: eg. https://console.cloud.google.com/artifacts/docker/google.com:dataflow-containers/us-west1/worker/v1beta3%2Fbeam_go_sdk?project=google.com:dataflow-containers
With paths like
us-west1-docker.pkg.dev/google.com/dataflow-containers/worker/v1beta3/beam_go_sdk:2.42.0
But this also appears to require certain additional gcloud authentications to work 100%, so unless dataflow service accounts/workers are automatically authenticated for fetching images, it may not yet be worthwhile to implement this.~~
GAR containers are substituted by the dataflow service assuming the GCR paths are used.
Issue Priority
Priority: 3
Issue Component
Component: sdk-go