databricks / cli

Databricks CLI
Other
140 stars 56 forks source link

CLI seems to ignore -t target if -p profile is specified #1147

Closed kenmyers-8451 closed 8 months ago

kenmyers-8451 commented 9 months ago

Describe the issue

Hi I think this is an issue with CLI but it is related to DAB (I don't think the issue is on the DAB side, but sorry if I mislabeled this).

I'm trying to deploy a bundle from local using databricks bundle deploy -p tst -t tst which I expected to use the tst credentials defined in my ~/.databrickscfg and my tst target defined in bundle.yaml (I know I could update this name, my team hasn't gotten around to it). This did not work however. I got the error:

Error: cannot create job: You need the 'attach' permission on instance pool ...-pool-d20su468 to configure a job to run on it.

However, this is the instance pool variable I defined for the dev target. Here is an excerpt of my targets:

targets:
  dev:
    default: true
    variables:
      driver_instance_pool_id: "...-pool-d20su468"  
  tst:
    variables:
      driver_instance_pool_id: "...-pool-n0tzdlm8" 

So it seems like despite specifying to use the tst profile, it tried to use the default dev target variables (it's like -t was ignored when -p was specified).

If I run databricks bundle deploy -t tst without specifying the profile, I get this error:

Error: cannot create job: You need the 'attach' permission on instance pool ...-pool-n0tzdlm8 to configure a job to run on it.

So in this case it seems like it is able to find the right target variable, but it is probably trying to do this with the dev profile since those are the default credentials.

Steps to reproduce the behavior

My .databrickscfg file looks like this

[DEFAULT]
host  = [dev host]
token = [dev token]

[tst]
host  = [tst host]
token = [tst token]

I know the tst token and credentials are correct because I can use other databricks-cli commands with -p tst

In our bundle, the driver_instance_pool_id variables are being used in a job cluster definition, I won't share all that as I expect you test this with other variables.

Expected Behavior

You should be able to use the -p and -t parameters together during databricks build deploy (and maybe other commands) but you can't.

Actual Behavior

Target definition, -t, seems to get ignored.

OS and CLI version

Databricks CLI v0.212.1

Is this a regression?

N/A

Debug Logs

% databricks bundle deploy -p tst -t tst --log-level=debug
18:15:49  INFO start pid=16987 version=0.212.1 args="databricks, bundle, deploy, -p, tst, -t, tst, --log-level=debug"
18:15:49 DEBUG Loading bundle configuration from: /Users/{user_id}/8451/3CM/3cm-auto-etl/bundle.yml pid=16987
18:15:49 DEBUG Apply pid=16987 mutator=seq
18:15:49 DEBUG Apply pid=16987 mutator=seq mutator=scripts.preinit
18:15:49 DEBUG No script defined for preinit, skipping pid=16987 mutator=seq mutator=scripts.preinit
18:15:49 DEBUG Apply pid=16987 mutator=seq mutator=ProcessRootIncludes
18:15:49 DEBUG Apply pid=16987 mutator=seq mutator=ProcessRootIncludes mutator=seq
18:15:49 DEBUG Apply pid=16987 mutator=seq mutator=InitializeVariables
18:15:49 DEBUG Apply pid=16987 mutator=seq mutator=DefineDefaultTarget(default)
18:15:49 DEBUG Apply pid=16987 mutator=seq mutator=LoadGitDetails
18:15:49 DEBUG Apply pid=16987 mutator=SelectTarget(tst)
18:15:49 DEBUG Apply pid=16987 mutator=seq
18:15:49 DEBUG Apply pid=16987 mutator=seq mutator=initialize
18:15:49  INFO Phase: initialize pid=16987 mutator=seq mutator=initialize
18:15:49 DEBUG Apply pid=16987 mutator=seq mutator=initialize mutator=seq
18:15:49 DEBUG Apply pid=16987 mutator=seq mutator=initialize mutator=seq mutator=InitializeWorkspaceClient
18:15:49 DEBUG Loading tst profile from /Users/{user_id}/.databrickscfg pid=16987 sdk=true
18:15:49 DEBUG Apply pid=16987 mutator=seq mutator=initialize mutator=seq mutator=PopulateCurrentUser
18:15:49 DEBUG Loading tst profile from /Users/{user_id}/.databrickscfg pid=16987 sdk=true
18:15:51 DEBUG GET /api/2.0/preview/scim/v2/Me
< HTTP/2.0 200 OK

I can share more if needed. You can see here that it seems to SelectTarget(tst) and Loading tst profile but despite this it errors out with the dev target variable

kenmyers-8451 commented 9 months ago

Another thing I've just tried was to add the profile to the target definition and remove the default specifier and this also did not work:

targets:
  dev:
    workspace:
      profile: "dev"
#    default: true
    variables:
      driver_instance_pool_id: "...-pool-d20su468"  
  tst:
    workspace:
      profile: "tst"
      driver_instance_pool_id: "...-pool-n0tzdlm8"

Same error as before with it trying to use ...-pool-d20su468 during databricks bundle deploy -t tst.

Log:

% databricks bundle deploy -t tst --log-level=debug
18:51:43  INFO start pid=18310 version=0.212.1 args="databricks, bundle, deploy, -t, tst, --log-level=debug"
18:51:43 DEBUG Loading bundle configuration from: /Users/{user_id}/8451/3CM/3cm-auto-etl/bundle.yml pid=18310
18:51:43 DEBUG Apply pid=18310 mutator=seq
18:51:43 DEBUG Apply pid=18310 mutator=seq mutator=scripts.preinit
18:51:43 DEBUG No script defined for preinit, skipping pid=18310 mutator=seq mutator=scripts.preinit
18:51:43 DEBUG Apply pid=18310 mutator=seq mutator=ProcessRootIncludes
18:51:43 DEBUG Apply pid=18310 mutator=seq mutator=ProcessRootIncludes mutator=seq
18:51:43 DEBUG Apply pid=18310 mutator=seq mutator=InitializeVariables
18:51:43 DEBUG Apply pid=18310 mutator=seq mutator=DefineDefaultTarget(default)
18:51:43 DEBUG Apply pid=18310 mutator=seq mutator=LoadGitDetails
18:51:43 DEBUG Apply pid=18310 mutator=SelectTarget(tst)
18:51:43 DEBUG Apply pid=18310 mutator=seq
18:51:43 DEBUG Apply pid=18310 mutator=seq mutator=initialize
18:51:43  INFO Phase: initialize pid=18310 mutator=seq mutator=initialize
18:51:43 DEBUG Apply pid=18310 mutator=seq mutator=initialize mutator=seq
18:51:43 DEBUG Apply pid=18310 mutator=seq mutator=initialize mutator=seq mutator=InitializeWorkspaceClient
18:51:43 DEBUG Loading tst profile from /Users/{user_id}/.databrickscfg pid=18310 sdk=true
18:51:43 DEBUG Apply pid=18310 mutator=seq mutator=initialize mutator=seq mutator=PopulateCurrentUser
18:51:43 DEBUG Loading tst profile from /Users/{user_id}/.databrickscfg pid=18310 sdk=true
18:51:44 DEBUG GET /api/2.0/preview/scim/v2/Me
< HTTP/2.0 200 OK
...
< HTTP/2.0 200 OK
< <Streaming response> pid=18310 mutator=seq mutator=deploy mutator=seq mutator=seq mutator=deferred mutator=seq mutator=terraform:state-pull sdk=true
18:51:49  INFO Local state is the same or newer, ignoring remote state pid=18310 mutator=seq mutator=deploy mutator=seq mutator=seq mutator=deferred mutator=seq mutator=terraform:state-pull
18:51:49 DEBUG Apply pid=18310 mutator=seq mutator=deploy mutator=seq mutator=seq mutator=deferred mutator=seq mutator=deferred
18:51:49 DEBUG Apply pid=18310 mutator=seq mutator=deploy mutator=seq mutator=seq mutator=deferred mutator=seq mutator=deferred mutator=terraform.Apply
Deploying resources...
18:51:51 ERROR Error: terraform apply: exit status 1

Error: cannot create job: You need the 'attach' permission on instance pool ...-pool-d20su468 to configure a job to run on it.

  with databricks_job.fact_marketshare,
  on bundle.tf.json line 292, in resource.databricks_job.fact_marketshare:
 292:       }
kenmyers-8451 commented 9 months ago

A member of our team sort of figured this out. This might actually be an issue with using a variable called instance_pool_id. I didn't show all of the variables before but there was an additional variable:

targets:
  dev:
    default: true
    variables:
      instance_pool_id: "...-pool-oq2hxn6o" 
      driver_instance_pool_id: "...-pool-d20su468" 
  tst:
    variables:
      instance_pool_id: "...-pool-ewvowkgg" 
      driver_instance_pool_id: "...-pool-n0tzdlm8" 

instance_pool_id is spot-type and used for the workers when defining the cluster and driver_instance_pool_id non-spot-type and is used for the drivers when defining a cluster.

We'd noticed in the past that actually instance_pool_id was getting ignored and seemed to get replaced with driver_instance_pool_id. What my teammate did was change the name instance_pool_id to worker_instance_pool_id and suddenly it worked. What might have actually been causing this error is that the dev driver_instance_pool_id was being slotted into tst instance_pool_id as well (since we see "...-pool-d20su468" as the error on tst). New targets look like this:

targets:
  dev:
    default: true
    variables:
      worker_instance_pool_id: "...-pool-oq2hxn6o" 
      driver_instance_pool_id: "...-pool-d20su468" 
  tst:
    variables:
      worker_instance_pool_id: "...-pool-ewvowkgg" 
      driver_instance_pool_id: "...-pool-n0tzdlm8" 

So perhaps instance_pool_id is some sort of protected variable or causing a scope issue?

Regardless, databricks bundle deploy -p tst -t tst is working now with this one refactor. I will let you close this unless you want to investigate this new issue. (Also I thought about reporting this issue before – about the instance_pool_id being ignored/replaced – but it wasn't breaking our builds like this was so it wasn't high on our priorities).

kenmyers-8451 commented 8 months ago

Closing this issue because we figured out the issue was on our company's end with a process overwriting the variable after we set it