Open sondrebouvet opened 8 months ago
Thanks for reporting.
In the debug log, did you redact the host field, or was it empty in the trace?
Thanks for reporting.
In the debug log, did you redact the host field, or was it empty in the trace?
It was empty, but we have confirmed that hostname is correct and seemingly not related to issue
Can you share how you're invoking the CLI from the action? E.g. are you using a profile, or setting env vars, and if so, how?
Can you share how you're invoking the CLI from the action? E.g. are you using a profile, or setting env vars, and if so, how?
Setting environment in step and invoking databricks fs cp to dbfs location
- name: Deploy .whl to Databricks DBFS
env:
DATABRICKS_TOKEN: ${{ secrets.DATABRICKS_TOKEN }}
PACKAGE_NAME: ${{ inputs.package_name }}
PACKAGE_FOLDER: ${{ inputs.package_folder }}
DATABRICKS_HOST: ${{ secrets.DATABRICKS_HOST }}
run: |
# Copy wheel package
databricks fs cp "dist/${{ env.PACKAGE_NAME}}" "dbfs:/FileStore/wheels/${{ env.PACKAGE_FOLDER }}${{ env.PACKAGE_NAME}}" --overwrite
If I understand correctly, you were previously using the legacy (Python) CLI in the same action, then replaced it with this one by using the setup-cli
action and it stopped working?
Note that there are expected incompatibilities between the legacy CLI and this one. The cp
command, however, should be compatible between these versions.
Can you confirm that other API calls fail as well? E.g. you could include a step where you run:
databricks current-user me
This prints out the user you're logged in as (who owns the token). If the cp
command fails, I expect that to fail as well.
The action setup looks good.
Some context that I should have included in the orginal issue: We have been using legacy Python CLI up until about a month ago. We switched to new CLI, not changing the original copy command, as syntax is identical. This worked fine, until latest release. It still works in our dev environment, for reasons not clear to us. As a sanity check, we have tested the following:
Tested the token locally using API, and databricks-cli (older version): works fine
Tested a different API call using databricks-cli, we tried databricks clusters list
which did not work as expected either.
Tested github actions jobs using API and legacy CLI, using same token and workspace, which works fine.
Our temporary solution, as of now, has been to use legacy python CLI. As mentioned before, we have however used new databricks-cli sucessfully. The error discussed in this issue, first appeared to us when running a release in our production environment. Our initial thought was that this behavior was caused by a config-mismatch between our dev and prod environments. However, this cannot be the case as legacy cli as well as API (using curl) still works fine in both envs
Thanks for the additional information.
Can you confirm the last version of the new CLI that did work? I.e. did it work with v0.212.3 and started failing with v0.212.4? Are there other env vars at play, or perhaps a .databrickscfg
when trying this locally?
If you have a concrete repro, as in, it works with version X but not with version Y, then we could bisect and look at what changed between those versions.
Sorry for taking a while getting back to you. I have now looked through many older versions of databricks-cli using setup composite action. I have found that version v0.200.1 works (by using reference 3f1981093bda661acaa5dccb3a191d3e146f6327 from setup repo). I assume that newer versions between latest (tested v0.212.4) and v0.200.1 also might work. I will look into it further
I tested multiple versions using API calls databricks clusters list
and databricks fs copy <path> <dbfs_path>
, borth of which does not work in latest version, but does work in v0.200.1
Further testing shows that affecting change happens in commit between version v0.203.1 and v0.203.2.
Thanks for digging in. And to confirm, once you're on v0.203.2, you see the error you included in the issue summary?
Almost, I think there has been some changes to the actual traceback, but the error is the same,
Run # Copy wheel package Error: Response from server (403 Forbidden) {"error_code":403,"message":"user not found"}: json: cannot unmarshal number into Go struct field APIErrorBody.error_code of type string Error: Process completed with exit code 1.
Any updates here @pietern ?
Even with the old cli, I'm facing this same issue with user PAT. This used to work earlier, nothing has been changed in the config.
Weirdly enough, the GitHub Action works fine in our dev env. It fails only in our prod env.
I even tested the prod token from my local machine, below are my findings:
databricks jobs list --all --version=2.1
→ This works finedatabricks jobs reset --json-file $jsoncontent --job-id $jobidtoedit --version=2.1
→ This fails weirdly throwing below error:
Error: Authorization failed. Your token may be expired or lack the valid scope
I'm confused by this behaviour. How would listing jobs work with the same token?
I'm using this to install the cli.
- name: install-databricks-cli
uses: microsoft/install-databricks-cli@v1.0.0
@sondrebouvet @pietern I would love to know your thoughts on this.
Even with the old cli, I'm facing this same issue with user PAT. This used to work earlier, nothing has been changed in the config.
Weirdly enough, the GitHub Action works fine in our dev env. It fails only in our prod env.
I even tested the prod token from my local machine, below are my findings:
databricks jobs list --all --version=2.1
→ This works finedatabricks jobs reset --json-file $jsoncontent --job-id $jobidtoedit --version=2.1
→ This fails weirdly throwing below error:Error: Authorization failed. Your token may be expired or lack the valid scope
I'm confused by this behaviour. How would listing jobs work with the same token?
I'm using this to install the cli.
- name: install-databricks-cli uses: microsoft/install-databricks-cli@v1.0.0
@sondrebouvet @pietern I would love to know your thoughts on this.
Update: Found the root cause using the --debug
option, it's regarding the user permissions to use the service principal.
@Abdul-Arfat-Mohammed, if all versions of the CLI does not work for you when authenticating with your production environment, I don't feel it's related to this issue. Our issue seems related to a change which occured in version v0.203.2
@Abdul-Arfat-Mohammed, if all versions of the CLI does not work for you when authenticating with your production environment, I don't feel it's related to this issue. Our issue seems related to a change which occured in version v0.203.2
@sondrebouvet Thanks for the confirmation. Yes, I agree.
Update: Found the root cause using the --debug option, it's regarding the user permissions to use the service principal.
I have updated in my previous comment.
Hello, I'm trying to perform a similar workflow setting a github action like the one mentioned by @sondrebouvet. I've noticed that the issue of authentication started when I updated my VsCode Databricks Extension to preview, in order to use Databricks Assets Bundle.
My current flow requires that I upload a wheel file to multiple workspaces, performing a databricks fs cp ...
in several steps using different pairs of _DATABRICKSTOKEN and _DATABRICKSHOST each time. If I don't upload databricks.yml to Git, the action run with no issues. On the other hand, if I upload it, it seems that the CLI is trying to connect to the default target workspace indicated in databricks.yml.
Could it be that there's some environment variable or something similar that is set with that file and it's evaluated with higher priority with respect to DATABRICKS_HOST? Right now, the only fix seem to be to make a script to run databricks configure --token for each workspace I need to use.
@grazianom-tuidi you can try to run databricks auth describe
command and provide an output here. It shows which auth is used, where parameters are coming from and etc.
Sure, here's the output.
Unable to authenticate: default auth: azure-cli: cannot get access token: ERROR: Please run 'az login' to setup account.
. Config: host=https://adb-*****.azuredatabricks.net/
-----
Current configuration:
✓ host: https://adb-****.azuredatabricks.net/ (from bundle)
✓ profile: default
Right now, it seems to fail even for the first workspace due to az login
required.
I might have failed in reverting things back to my previous settings, here's my action:
- name: Upload the wheels
uses: actions/upload-artifact@v3
with:
name: Upload wheel
path: dist/*.whl
- name: Run auth describe
run: databricks auth describe
- name: Push to DBFS (Report)
env:
DATABRICKS_HOST: ${{ secrets.DATABRICKS_HOST_REPORT }}
DATABRICKS_TOKEN: ${{ secrets.DATABRICKS_TOKEN_REPORT }}
run: |
# Copy wheel files
databricks fs mkdir dbfs:/FileStore/libraries
for f in dist/*.whl; do
databricks fs cp $f dbfs:/FileStore/libraries/test-latest-py3-none-any.whl --overwrite
done
- name: Push to DBFS (dev)
env:
DATABRICKS_HOST: ${{ secrets.DATABRICKS_HOST_DEV }}
DATABRICKS_TOKEN: ${{ secrets.DATABRICKS_TOKEN_DEV }}
run: |
# Copy wheel files
databricks fs mkdir dbfs:/FileStore/libraries
for f in dist/*.whl; do
databricks fs cp $f dbfs:/FileStore/libraries/test-latest-py3-none-any.whl --overwrite
done
From the databricks auth describe
command it seems that it's using the host defined by bundle, so trying to overwrite it using env variable should not work. Is this the expected behaviour?
@grazianom-tuidi yes, this is an expected behaviour at the moment, see for the details https://github.com/databricks/cli/issues/1358
I see, since there's no way to override the bundle configs (aside from listing all possible targets in databricks.yml) I guess I'll stick to the old CLI which seems to give priority to variables defined in .databrickscfg even with bundle configs already set.
@sondrebouvet is the issue still present for you on the very latest CLI version?
Hi, I am taking over for @sondrebouvet here :) With the very latest CLI version, we still get an error with the databricks fs cp
command, but the error itself changed:
# Copy wheel package
databricks fs cp "dist/${{ env.PACKAGE_NAME}}" "dbfs:/FileStore/wheels/${{ env.PACKAGE_FOLDER }}${{ env.PACKAGE_NAME}}" --overwrite
...
Error: Invalid access to Org: 5435654711470629
As before, the exact same setup works fine with the older CLI version and in our dev environment (with the latest CLI version).
One way to prioritize the .databrickscfg in the latest version of databricks-cli is to run the command from a folder that does not contain the databricks.yml. In the context of a GitHub actions, I solved it as follows:
# Create .databrickscfg
echo "[DEFAULT]" > .databrickscfg
echo "host = ${{ secrets.HOST }}" >> .databrickscfg
echo "token = ${{ secrets.TOKEN }}" >> .databrickscfg
cd ..
# Copy wheel files
for f in <repo_name>/dist/*.whl; do
databricks fs cp $f dbfs:/<path> --overwrite
done
Steps to reproduce the behavior
Using the composite github action 'setup-cli' (https://github.com/databricks/setup-cli), an existing pipeline fails. Installing databricks-cli using latest version by curl, yields same issue. This indicates that issue is not related to composite action, but rather databricks-cli itself. The authentication method is a databricks personal access token generated by a service principal. This token has been tested and is valid. The token can be used sucessfully in github actions using the databricks API 2.0 endpoints directly. The issue is restricted to the newest version of databricks-cli. Legacy pip-based cli works fine, thus eliminating possibility of incorrect setup of workspace or token.
OS and CLI version
Github actions ubuntu-latest. Databricks cli: v0.212.4
Is this a regression?
Appears to be a regression, as pip databricks-cli works fine.
Debug Logs
Error: unexpected error handling request: json: cannot unmarshal number into Go struct field APIErrorBody.error_code of type string. This is likely a bug in the Databricks SDK for Go or the underlying REST API. Please report this issue with the following debugging information to the SDK issue tracker at https://github.com/databricks/databricks-sdk-go/issues. Request log: