Closed deep-diver closed 2 years ago
Progress update:
--project
flag should be set. --service_account_email
should have permissions of Dataflow Worker
and Storage Object Admin
. Vertex AI Pipeline
runs via the default Service Account(compute
Service Account), that Service Account should have Service Account User
to delegate jobs to the other Service Accountbeam_pipeline_args = [
"--runner=DataflowRunner",
+ "--project=" + GOOGLE_CLOUD_PROJECT,
"--region=" + GOOGLE_CLOUD_REGION,
"--service_account_email=" + DATAFLOW_SERVICE_ACCOUNT,
"--machine_type=" + MACHINE_TYPE,
"--experiments=use_runner_v2",
"--max_num_workers=" + str(max_num_workers),
"--disk_size_gb=" + str(disk_size),
]
example_gen = ImportExampleGen(...)
example_gen.with_beam_pipeline_args(beam_pipeline_args)
since the full resolution data is large, the training server got OOM with the current setup. The following set did work though.
GCP_AI_PLATFORM_TRAINING_ARGS = {
vertex_const.ENABLE_VERTEX_KEY: True,
vertex_const.VERTEX_REGION_KEY: GOOGLE_CLOUD_REGION,
vertex_training_const.TRAINING_ARGS_KEY: {
"project": GOOGLE_CLOUD_PROJECT,
"worker_pool_specs": [
{
"machine_spec": {
- "machine_type": "n1-standard-4",
+ "machine_type": "n1-standard-8",
- "accelerator_type": "NVIDIA_TESLA_K80",
+ "accelerator_type": "NVIDIA_TESLA_V100",
"accelerator_count": 1,
},
"replica_count": 1,
"container_spec": {
"image_uri": PIPELINE_IMAGE,
},
}
],
},
"use_gpu": True,
}
The default VM running
ImportExampleGen
step of Vertex Pipeline does not have sufficient resources to handle the entire raw Sidewalk datasets. Hence, we need to integrateDataflow
into theImportExampleGen
component.In order to do this, we can call
with_beam_pipeline_args()
method on theImportExampleGen
with appropriate configurations of Dataflow: