databricks / databricks-sdk-py

Databricks SDK for Python (Beta)
https://databricks-sdk-py.readthedocs.io/
Apache License 2.0
371 stars 124 forks source link

node_type_id and instance_pool_id conflict when calling workspace_client.jobs.create() method #771

Open sivadotblog opened 1 month ago

sivadotblog commented 1 month ago

Description The issue arises when using the workspace_client.jobs.create() function to create a job with specific cluster settings. The function does not accept both node_type_id and instance_pool_id parameters simultaneously; it only accepts one of them. However, the JobCluster class, which is used to define the cluster settings, includes both node_type_id and instance_pool_id by default. I cant remove them.

Reproduction Code Context The job_settings dictionary, which is passed to workspace_client.jobs.create(), includes a job_clusters key. This key uses the JobCluster class to define the cluster specifications. Here is a simplified version of the relevant code:

''' job_settings = { "name": name, "tasks": tasks, "job_clusters": job_clusters, "timeout_seconds": timeout_seconds, }

job_clusters = [ JobCluster( apply_policy_default_values=True, autoscale=AutoScale(max_workers=None, min_workers=1), custom_tags={'application-id': '0001818'}, data_security_mode=DataSecurityMode.SINGLE_USER, driver_instance_pool_id='1220-224524-shore2-pool-0dlxvf9c', instance_pool_id='1220-224524-shore2-pool-0dlxvf9c', spark_version='15.4.x-scala2.12', spark_conf={'spark.databricks.delta.preview.enabled': True}, ) ]

workspace_client.jobs.create(**job_settings) '''

Expected behavior The job is created with the instance pool id provider

Is it a regression? nope tested with 0.17.0

Debug Logs

the job fails with the error databricks.sdk.errors.platform.invalidparametervalue "the field node id cannot be supplied when an instance pool id is provided.

as you can see, am not passing the node_type_id. the class JobCLuster defaults it to none.

Additional context when I did a dir(of the jobcluster) i do see that node_type_id and driver node type id set to none.

sivadotblog commented 1 month ago

UPDATE: I am able to get past this issue if I disable apply policy defaults. So I believe the policy defaults apply the node type id and driver type id. but i cant create a policy without those fields. So i believe this is still an issue