Azure / AML-Kubernetes

AzureML customer managed k8s compute samples
MIT License
80 stars 33 forks source link

local k8s compute instance type via SDK #262

Open amahab opened 1 year ago

amahab commented 1 year ago

I’m evaluating the Azure ML on Azur Arc enabled K8s.

Able to specify target compute instance type created in k8s via CLI v2 job yaml file - https://github.com/Azure/AML-Kubernetes/blob/master/examples/training/simple-train-cli/job.yml

The local k8s compute instance types are created as per this link - https://learn.microsoft.com/en-us/azure/machine-learning/how-to-manage-kubernetes-instance-types

Is there a python SDK method to specify instance type for target compute k8s cluster? I could not find examples or SDK documentation to achieve that.

Thanks!

jiaochenlu commented 1 year ago

@amahab Here is the SDK example of using Kubernnetes compute for training jobs. If you want to specify a custom instance type to your job, just add the "instance_type" parameter and specify the value to it, for example:

src = ScriptRunConfig(source_directory=script_folder,    script='train.py',    arguments=args,    compute_target=amlarc_compute,    instance_type=my_instance,    environment=env)

amahab commented 1 year ago

@jiaochenlu

I'm running this notebook, Specified instance_type parameter as you mention above. Getting unexpected argument error.

TypeError: init() got an unexpected keyword argument 'instance_type'

jiaochenlu commented 1 year ago

@amahab Sorry my bad. The notebook I provided earlier is the example regarding to SDK v1, and you should specify instance type as following code:

src = ScriptRunConfig(source_directory=script_folder,     script='train.py',     arguments=args,     compute_target=amlarc_compute,     instance_type=my_instance,     environment=env) src.run_config.kubernetescompute.instance_type = "\<your-instance-type-name>"

But currently we have SDK v2 which is already GA, more example reagrding to SDK v2 you can find here, to use Kubernetes compute and specify instance type with SDK v2 for training job, please follow the example I provided before.