google-github-actions / run-vertexai-notebook

A GitHub Action for running a Google Cloud Vertex AI notebook.
https://cloud.google.com/vertex-ai
Apache License 2.0
20 stars 9 forks source link
actions artificial-intelligence gcp github-actions google-cloud google-cloud-platform machine-learning ml-notebooks vertex-ai

run-vertexai-notebook GitHub Action

GitHub composite action to trigger asynchronous execution of a Jupyter Notebook via Google Cloud Vertex AI.

The typical SDLC for a Jupyter Notebook includes source control of the notebook file without it's output cells. It is a best practice that notebooks should be stored this way to prevent commiting potentially sensitive data. A downside of this practice is that code reviewers will not be able to see the output while reviewing and may not be able to accurately gauge the impact of a change.

The main purpose of this action is to provide a secure way to execute a notebook, store the output (outside of source control), and serve it to a reviewer with proper access controls.

This action relies on the notebook execution functionality of Google Cloud's Vertex AI to execute the notebook and store the executed notebook with output cells in Google Cloud Storage. Access to the output is controled by Google Cloud Storage ACLs.

NOTE: Notebooks executed by this action will fall under the notebook executor requirements defined by Vertex AI.

This action will provision cloud resources with associated costs so it is recommended that you control the usage of this action by:

This is not an officially supported Google product, and it is not covered by a Google Cloud support contract. To report bugs or request features in a Google Cloud product, please contact Google Cloud support.

Prerequisites

This action requires Google Cloud credentials to execute gcloud commands. See setup-gcloud for details.

Setup

  1. Create a new Google Cloud Project (or select an existing project) and enable the Vertex AI APIs.

  2. Create or reuse a GitHub repository for the example workflow:

    1. Create a repository.

    2. Move into the repository directory:

      $ cd <repo>
    3. Copy the example into the repository:

      $ cp -r <path_to>/notebook-review-action/examples/notebook-review/ .
  3. Create a GCS bucket if one does not already exist.

  4. Create a Google Cloud service account if one does not already exist.

  5. Add the following Cloud IAM roles to your service account:

    • roles/aiplatform.user - allows running jobs in Vertex AI
    • roles/storage.objectWriter - allows writing notebook files to object storage

    Note: These permissions are overly broad to favor a quick start. They do not represent best practices around the Principle of Least Privilege. To properly restrict access, you should create a custom IAM role with the most restrictive permissions.

  6. Setup authenticaion to Google Cloud using workload identity federation with the above service account.

Usage

 jobs:
   notebook-review:
    name: Notebook Review
    needs: changes
    runs-on: ubuntu-latest

    steps:
    - id: 'auth'
      name: 'Authenticate to Google Cloud'
      uses: 'google-github-actions/auth@v0'
      with:
        workload_identity_provider: 'projects/123456789/locations/global/workloadIdentityPools/my-pool/providers/my-provider'
        service_account: 'my-service-account@my-project.iam.gserviceaccount.com'

    - id: notebook-review
      uses: google-github-actions/run-vertexai-notebook@v0
        with:
          gcs_source_bucket: '${{ env.GCS_SOURCE }}'
          gcs_output_bucket: '${{ env.GCS_OUTPUT }}'
          allowlist: '${{ needs.changes.outputs.notebooks_files }}'

Running R notebooks

R requires a different base container and kernel

    - id: notebook-review
      uses: google-github-actions/run-vertexai-notebook@v0
        with:
          gcs_source_bucket: '${{ env.GCS_SOURCE }}'
          gcs_output_bucket: '${{ env.GCS_OUTPUT }}'
          allowlist: '${{ needs.changes.outputs.notebooks_files }}'
          vertex_container_name: 'gcr.io/deeplearning-platform-release/r-cpu.4-1:latest' # R base container
          kernel_name: 'ir' # The stock R kernel

See a more complete example in examples.

Inputs