Open arcaputo3 opened 3 months ago
Note that destroying the target and/or manually deleting the .bundle
and retrying yields the same issue.
Thanks for reporting this issue. Can you share (a snippet of) your bundle configuration?
We didn't change the merge logic that affects these code paths so I expect an issue upstream.
Being able to reproduce this would be very helpful.
Sure, please see our databricks.yml
below. Our sub yml
s are purely representative of workflows.
# This is a Databricks asset bundle definition for tjc_databricks.
# See [REDACTED] for documentation.
bundle:
name: tjc-databricks
git:
origin_url: [REDACTED]
# branch: main
artifacts:
default:
type: whl
build: poetry build
path: .
include:
- workflows/*/*.yml
variables:
environment:
description: The environment of the workflow
default: dev
principal_user:
description: The principal user to run in production
default: [REDACTED]
tjc_excelsior_version:
description: The version of `tjc-excelsior` to use
default: 1.0.8
tika_ocr_version:
description: The version of `tika-ocr` to use
default: 0.1.6
pause_status:
description: The status of scheduling for jobs. Only unpauses for prod.
default: PAUSED
pause_status_file_sync:
description: The status of allowing file notifications. Only pauses for dev.
default: UNPAUSED
limit:
description: The limit to use for testing
default: 10
targets:
dev:
mode: development
default: true
workspace:
host: [REDACTED]
variables:
environment: dev
pause_status_file_sync: PAUSED
test:
mode: development
workspace:
host: [REDACTED]
root_path: /Users/${var.principal_user}/.bundle/${bundle.name}/${bundle.target}
run_as:
user_name: ${var.principal_user}
variables:
environment: test
prod:
mode: production
workspace:
host: [REDACTED]
root_path: /Users/${var.principal_user}/.bundle/${bundle.name}/${bundle.target}
run_as:
user_name: ${var.principal_user}
variables:
environment: prod
pause_status: UNPAUSED
limit: "-1"
Thanks for providing the config. I'm able to reproduce.
The underlying problem is that we changed how we store variable values. All values used to be cast into a string, so you could use YAML strings, integers, and bools interchangeably and it would work. We changed this to accommodate complex-valued variables and now they can assume any type. Mixing types at the YAML level is what's causing the issue here.
We'll investigate further and figure out how to support this better.
In the meantime, you can work around the issue by making all variable values explicit strings:
variables:
# ...
limit:
description: The limit to use for testing
default: "10"
Note the quotes around the value 10
.
@pietern also catching this error with setting up job timeouts via variable.
I've tried to put 7200 into the quotes but still getting an error "cannot merge int with string", any workarounds?
Thanks!
targets:
qa:
mode: production
workspace:
host: http....
root_path: /Workspace/TEST/.bundle/${bundle.name}/${bundle.target}
variables:
timeout_seconds: 7200
warning_seconds: 5400
run_as:
service_principal_name: ${var.spn}
resources:
jobs:
example_ingest:
timeout_seconds: ${var.timeout_seconds}
health:
rules:
- metric: RUN_DURATION_SECONDS
op: GREATER_THAN
value: ${var.warning_seconds}
@pietern also catching this error with setting up job timeouts via variable.
I've tried to put 7200 into the quotes but still getting an error "cannot merge int with string", any workarounds?
Thanks!
targets: qa: mode: production workspace: host: http.... root_path: /Workspace/TEST/.bundle/${bundle.name}/${bundle.target} variables: timeout_seconds: 7200 warning_seconds: 5400 run_as: service_principal_name: ${var.spn} resources: jobs: example_ingest: timeout_seconds: ${var.timeout_seconds} health: rules: - metric: RUN_DURATION_SECONDS op: GREATER_THAN value: ${var.warning_seconds}
ok, workaround for my case is just to remove timeout_seconds from job template and push it only from the main bundle deployment file, in this case it works even without a quotes.
@blood-onix This sounds like a different issue.
Did you hard-code timeout_seconds: <some integer>
in your base definition?
@blood-onix This sounds like a different issue.
Did you hard-code
timeout_seconds: <some integer>
in your base definition?
If you mean job template - yes, so the job was created manually via UI and when exported via databrircks bundle generate so timeout_seconds set in a job template. Idea is to overwrite value for a different targets while for dev it will use default value from a job template.
../resources/example_ingest.yml
resources:
jobs:
example_ingest:
name: 'Example test ingest'
email_notifications:
on_duration_warning_threshold_exceeded:
- redacted@email.com
no_alert_for_skipped_runs: false
webhook_notifications: {}
timeout_seconds: 3200
max_concurrent_runs: 1
tasks:
- task_key: Ingest
Thanks for providing the config. I'm able to reproduce.
The underlying problem is that we changed how we store variable values. All values used to be cast into a string, so you could use YAML strings, integers, and bools interchangeably and it would work. We changed this to accommodate complex-valued variables and now they can assume any type. Mixing types at the YAML level is what's causing the issue here.
We'll investigate further and figure out how to support this better.
In the meantime, you can work around the issue by making all variable values explicit strings:
variables: # ... limit: description: The limit to use for testing default: "10"
Note the quotes around the value
10
.
This is helpful, thank you. Out of curiosity, is there a known reason why this only appears to be affecting us in the prod
target and only via GitHub actions? Our dev
and test
CI/CD works, and locally on Windows 11 I can successfully run databricks bundle deploy -t prod
.
@arcaputo3 If the configuration you provided is complete, then it is because only the prod
target overrides the variable value (with an incompatible type). For the other targets, it can use the default provided at the top level directly.
Describe the issue
Deploying via DABs through GitHub actions fails within a production target for CLI v0.222.0.
Configuration
Workflow file:
Steps to reproduce the behavior
databricks bundle deploy -t <prod-target>
via a GitHub action using the above workflow file.Expected Behavior
Deployment should execute properly for any mode.
Actual Behavior
Deployment fails for
mode: production
and works formode: development
.OS and CLI version
CLI:
v0.222.0
OS:Ubuntu 22.04.4
Running on Windows 11 locally works fine for CLI
v0.222.0
.Is this a regression?
Yes, using
databricks/setup-cli@v0.221.1
fixes the issue.Debug Logs