databricks / cli

Databricks CLI
Other
148 stars 56 forks source link

Improves detection of PyPI package names in environment dependencies #1699

Closed andrewnester closed 2 months ago

andrewnester commented 3 months ago

Changes

Improves detection of PyPi package names in environment dependencies

Tests

Added unit tests

andrewnester commented 2 months ago

@GeroSalas could you please describe what's the failure is and what error do you get?

GeroSalas commented 2 months ago

@andrewnester sure!

Basically I defined an env var ${var.signol_lib_package_version} with the value signol_lib-0.4.4-20240822+prod-py3-none-any.whl

so I reference it dinamically in the YAML of the job tasks where I require it, like below:

libraries:            
    - whl: ${workspace.root_path}/files/dist/${var.signol_lib_package_version}

pyproject.toml

signol-lib = {path = "dist/signol_lib-0.4.4-20240822+prod-py3-none-any.whl"}

All the setup was working fine. Job tasks were executing fine and the compute was installing the right libs, but suddenly it stopped working, so when I went into the cicd logs now I see these 2 new lines appeared in my AWS codebuild logs:

Building signol_lib...
Uploading signol_lib-0.4.4-py3-none-any.whl...

And definietly yes, the signol_lib package was overwritten again because this recent change introduced picked up but referenced the wrong whl full name and reupload it incorrectly so my tasks cannot find the correct one anymore.

andrewnester commented 2 months ago

@GeroSalas Just to confirm few things:

  1. did it work on version 0.226.0?
  2. Do you have an artifacts section in bundle configuration defined for this library?
GeroSalas commented 2 months ago

@andrewnester

  1. Yes, it was working fine in v0.226.0
  2. No
andrewnester commented 2 months ago

@GeroSalas can you please share the full bundle YAML configuration? I can't seems to reproduce the error so far, so something else might be missing. Also, do you have setup.py file in your bundle root directory?

mfleuren commented 2 months ago

@andrewnester

I'm experiencing a similar issue as @GeroSalas where all our integration tests cannot be validated,deployed nor run. Everything was working fine in v0.226.0 but since updating to v0.227.0 it stopped working and gives a Error: Python wheel tasks require compute with DBR 13.3+ to include local libraries. Please change your cluster configuration or use the experimental 'python_wheel_wrapper' setting. See https://docs.databricks.com/dev-tools/bundles/python-wheel.html for more information. for me. Rolling back to 0.226.0 for now, as that still works. (Thus, error handling also seems incomplete/incorrect as the DBR has nothing to do with this)

We do have a setup.py in the bundle's root directory for another purpose, but are running the DAB using a direct link with GIT. The branch the DAB is based on, is passed on using an variable.

Relevant code snippets:

Databricks.yml

bundle:
    name: wise

include:
    - resources/*.yml

variables:
  integration_branch:
    description: The source branch of the PR for the integration test
    default: development

targets:    
    integration:
      mode: production
      git:
          branch: ${var.integration_branch}
      workspa<host>
          root_path: /Shared/${bundle.name}/${bundle.target}
      run_as:
          user_name: <user>

azure-pipelines.yml

resources:
    containers:
        - container: pycontainer
          image: databricksruntime/standard:10.4-LTS

steps:
    - script: |

          cd ./wise_bundle/wise

          export DATABRICKS_HOST=$(URL)
          export DATABRICKS_TOKEN=$(PAT)

          echo $(System.PullRequest.TargetBranch)

          if [ "$(System.PullRequest.TargetBranch)" == "refs/heads/development" ]
          then
            echo "Deploying bundles to the integration environment"

            # Get the source branch
            branch=$(System.PullRequest.SourceBranch)
            echo $branch
            branch_name=${branch#refs/heads/}
            echo $branch_name

            # Deploy integration workflow using the source branch
            databricks bundle validate -t integration
            databricks bundle deploy -t integration --force-lock --var="integration_branch=$branch_name"
            databricks bundle run -t integration <flow_name> --no-wait
          fi

      workingDirectory: $(Build.SourcesDirectory)
      target: pycontainer
      displayName: "Run integration test"
andrewnester commented 2 months ago

@mfleuren do you have a libraries or environments section in your DABs config files where you reference any libraries? Could you share this section?

mfleuren commented 2 months ago

@andrewnester we do have libraries defined for individual tasks, f.i.

- task_key: deploy_model_on_dsia
                  depends_on:
                      - task_key: daily_inference
                  notebook_task:
                    notebook_path: wise_bundle/wise/src/deployment/deploy_api
                    source: GIT
                  job_cluster_key: wise_etl_cluster
                  max_retries: 1
                  min_retry_interval_millis: 60000
                  libraries:
                      - pypi:
                            package: ${var.msal-package}
                      - pypi:
                            package: ${var.requests-package}

Where the packages are defined centrally, in this case:

  requests-package:
    description: PyPi package
    default: requests==2.25.1
  msal-package:
    description: PyPi package
    default: msal==1.28.0

All defined packages are publicly available on Pypi.

andrewnester commented 2 months ago

Thanks for the details! Indeed, it's a bug on our side which is fixed in this PR https://github.com/databricks/cli/pull/1717 It will be released in the next CLI release

mfleuren commented 2 months ago

@andrewnester awesome, thanks!