databricks / run-notebook

Apache License 2.0
47 stars 20 forks source link

libraries-json configuration not working #26

Open JamesBorg opened 1 year ago

JamesBorg commented 1 year ago

Trying to have a library installed with the created cluster but am running into the following error.

Error: {"error_code":"MALFORMED_REQUEST","message":"Could not parse request object: Expected 'START_OBJECT' not 'VALUE_STRING'\n at [Source: (ByteArrayInputStream); line: 1, column: 405]\n at [Source: java.io.ByteArrayInputStream@c6f971a; line: 1, column: 405]"}

Here is a copy of the workflow configuration:

name: Databricks notebook running test
on:
  workflow_dispatch:
  push:

env:
  DATABRICKS_HOST: https://******************.azuredatabricks.net
  NODE_TYPE_ID: Standard_NC6s_v3
  GITHUB_TOKEN: ${{ secrets.REPO_TOKEN }}

jobs:
  databricks_notebook_test:
    runs-on: ubuntu-20.04
    steps:
      - name: Checkout repo
        uses: actions/checkout@v3
      - name: Generate AAD Token
        run: ./.github/workflows/scripts/generate-aad-token.sh ${{ secrets.AZURE_SP_TENANT_ID }} ${{ secrets.AZURE_SP_APPLICATION_ID }} ${{ secrets.AZURE_SP_CLIENT_SECRET }}
      - name: Train model
        uses: databricks/run-notebook@v0
        id: train
        with:
          local-notebook-path: notebooks/test.py
          git-commit: ${{ github.event.pull_request.head.sha || github.sha}}
          libraries-json: >
            [
              { "pypi": "accelerate" }
            ]
          new-cluster-json: >
            {
              "spark_version": "11.1.x-gpu-ml-scala2.12",
              "num_workers": 0,
              "spark_conf": {
                "spark.databricks.cluster.profile": "singleNode",
                "spark.master": "local[*, 4]",
                "spark.databricks.delta.preview.enabled": "true"
              },
              "node_type_id": "${{ env.NODE_TYPE_ID }}",
              "custom_tags": {
                "ResourceClass": "SingleNode"
              }
            }
          access-control-list-json: >
            [
              {
                "group_name": "users",
                "permission_level": "CAN_VIEW"
              }
            ]
          run-name: testing github triggering of databricks notebook

The workflow runs through fine with the libraries-json configuration removed (and the necessary library installed within the triggered notebook.)

Is this a bug? Or am I misunderstanding how libraries-json can be used?

JamesBorg commented 1 year ago

Thanks to @vladimirk-db who provided me the solution.

Needed to modify to:

{ "pypi": { "package": "accelerate" } }

Perhaps the README should be updated to reflect this?

motya770 commented 1 year ago

Also https://github.com/databricks/run-notebook/issues/46

benoitmiserez commented 5 months ago

Updated in #52