databrickslabs / dbx

🧱 Databricks CLI eXtensions - aka dbx is a CLI tool for development and advanced Databricks workflows management.
https://dbx.readthedocs.io
Other
437 stars 119 forks source link

Command dbx sync dbfs does not respect the credentials passed thru env variables #506

Open kfot opened 1 year ago

kfot commented 1 year ago

Expected Behavior

Make the dbx sync dbfs respects the same standard variables as the databricks-cli and make it consistent with its own tutorial describing usage of DATABRICKS_HOST and DATABRICKS_TOKEN env variables as an authentication alternative to .databrickscfg file.

Current Behavior

Having sourced valid DATABRICKS_HOST and DATABRICKS_TOKEN into the shell and the .databrickscfg file missing, the command raises following error.

$ dbx sync dbfs --include test --dest /user/kfot
Usage: dbx sync dbfs [OPTIONS]
Try 'dbx sync dbfs --help' for help.
╭─ Error ──────────────────────────────────────────────────────────────────────────────────────────────────────────────╮
│ Could not find a databricks-cli config for profile DEFAULT                                                           │
╰──────────────────────────────────────────────────────────────────────────────────────────────────────────────────────╯

Your Environment

kfot commented 1 year ago

The problem lies in the handling of the dbx default values in the dbfs and repo functions. The profile param value defaults to "DEFAULT" string

def(
    ...,
    profile: str = PROFILE_OPTION,
    ...,
):

(because the PROFILE_OPTION defined here references the databricks-cli default profile value defined here...)

Then, within the dbfs and repo functions (two identical code sections), the value is passed to the get_databricks_config function (which has misleading profile=None signature) and then in the body the function considers our "DEFAULT" profile value as a valid, explicitly provided profile which it is clearly not.

        if profile:
            config = ProfileConfigProvider(profile).get_config()  <- our guy ends up here
        else:
            config = get_config()                                 <- while it should go right here
doug-cresswell commented 1 year ago

Is there any progress on resolving this? This bug makes it very difficult to use dbx sync in a CICD setting, and cost significant time troubleshooting before discovering this issue.