DataBiosphere / leonardo

Notebook service
BSD 3-Clause "New" or "Revised" License
43 stars 21 forks source link

Build Status codecov

Leonardo

leonardo serves as a way to launch compute within the Terra security boundary. It does so via multiple different cloud hardware virtualization mechanisms, currently leveraging only the Google Cloud Platform (GCP) and Azure .

leonardo supports launching the following services for compute:

Currently, leonardo supports the launching of custom docker images for Jupyter and Rstudio in virtual machines and Dataproc. It also supports launching applications in Kubernetes, with a spotlight on Galaxy.

It is recommended to consume these APIs and functionality via the Terra UI

We use JIRA instead of the issues page on GitHub. If you would like to see what we are working you can visit our active sprint or our backlog on JIRA. You will need to set up an account to access, but it is open to the public.

Setting up a Java Client Library

Add the leonardo-client to your build. An example for sbt is below:

libraryDependencies += "org.broadinstitute.dsde.workbench" %% "leonardo-client" % "1.3.6-<git hash>"

Please be sure to replace the <git hash> with the first 7 characters of the commit hash of the HEAD of develop. You can find a list of available releases and <git hash>-es from artifactory

Example Scala Usage:

import org.broadinstitute.dsde.workbench.client.leonardo.api.RuntimesApi
import org.broadinstitute.dsde.workbench.client.leonardo.ApiClient
import org.broadinstitute.dsde.workbench.client.leonardo.model.GetRuntimeResponse

class LeonardoClient(leonardoBasePath: String) {
  private def leonardoApi(accessToken: String): RuntimesApi = {
    val apiClient = new ApiClient()
    apiClient.setAccessToken(accessToken)
    apiClient.setBasePath(leonardoBasePath)
    new RuntimesApi(apiClient)
  }

  def getAzureRuntimeDetails(token: String, workspaceId: String, runtimeName: String): GetRuntimeResponse = {
    val leonardoApi = leonardoApi(token)
    leonardoApi.getAzureRuntime(workspaceId, runtimeName)
  }
}

Building and running leonardo locally

To run leonardo locally, you are going to need the following:

The following sections take you through those steps in a logical order.

Clone the repo and submodules

The first step is to get the code. This will allow you to not only follow this README locally, you will also be able to install setup the environmental dependencies as well as build leonardo locally.

git clone https://github.com/databiosphere/leonardo.git
cd leonardo

And as an aside, this repository uses git submodules. You will need to execute the following commands as well:

git submodule update --init --recursive

Install leonardo's dependencies

The following tools are required to run leonardo:

Please feel free to install each tool individually as you see fit for your environment, or you can follow along with this process to get your environment set up. Tool setup is facilitated through the use of brew. This allows us to have a little consistency across environments thanks to the Brewfile.lock.json

At this point all the third party dependencies have been installed, and the environment variables necessary to support those tools have been set up.

Next up, interacting with the leonardo-repository! - :nerd_face:

Identify and Set up your MySQL database

Establish a local proxy to the Cloud SQL remote instance

  1. Run the following command to setup your gcloud-cli to work with ``

    gcloud auth application-default login

    NOTE: You may need to run gcloud config set project <PROJECT_ID> if your environment is setup to use a different Google Cloud Project

  2. Navigate your browser to the Cloud SQL dashboard,

    • Select your database's Instance Overview screen by clicking on it's Instance ID, and then
    • In the Connect to this instance-section, copy the Connection name
  3. In the Cloud SQL dashboard for your instance, Reset the passwords for the users ()

    • Select your database's Instance Overview screen by clicking on it's Instance ID,
    • Select the Users option from the menu on the left,
    • Select the three vertical dots for the user, and then
    • Change password

    NOTE: You will want to update your environment (.zprofile - see above or locally) with the correct username and password

    export CLOUDSQL_INSTANCE=<your cloned db name> # for Leo and CloudSQL proxy
    export DB_USER=<db username> # for Leo only, not CloudSQL proxy
    export DB_PASSWORD=<db password> # for Leo only, not CloudSQL proxy
  4. Execute the following command in a terminal window to establish a local connection to the database. Mind that you will need to be connected to the VPN.

    cloud-sql-proxy [CLOUD-SQL-CONNECTION-NAME-HERE]

Other database info ...

You can add more vars for the CloudSQL proxy container by editing ./local/sqlproxy.env.

Building and running leonardo locally

VPN

You must be connected to the VPN to complete the rest of this process.

Building local dependencies

Leo needs a copy of the Go Helm library and secrets, files, and env vars stored in k8s.

Overrides

By adding entries to ./local/overrides.env, you can override the value of any variable from k8s for Leo.

Unsetting

By adding entries to ./local/unset.env, you can remove variables from k8s for Leo. Applied after retrieving variables from k8s and before applying overrides.

Host alias

If you haven't already, add 127.0.0.1 local.dsde-dev.broadinstitute.org to /etc/hosts:

sudo sh -c "echo '127.0.0.1       local.dsde-dev.broadinstitute.org' >> /etc/hosts"

Run proxies

Run leonardo

Troubleshooting leonardo

Architecture issues

If you get an error like

Exception in thread "io-compute-6" java.lang.UnsatisfiedLinkError: Unable to load library 'helm':
...
(mach-o file, but is an incompatible architecture (have 'arm64', need 'x86_64')),
...

You are probably on an M1 (arm64) running an amd64 (x86_64) version of Java. You can verify by first finding and setting your JAVA_HOME (e.g. with which java or jenv if present) and then checking the output of

file "${JAVA_HOME}/bin/java

It should read something like

/Library/Java/JavaVirtualMachines/temurin-17.jdk/Contents/Home/bin/java: Mach-O 64-bit executable arm64

Note the Mach-O 64-bit executable arm64. Otherwise, install an arm64 version of Java and try again. Adoptium should work fine.

Verify that local Leo is running

Status endpoint: https://local.dsde-dev.broadinstitute.org/status

Swagger page: https://local.dsde-dev.broadinstitute.org

Debugging in IntelliJ

  1. Install the EnvFile plugin
  2. Install the Scala plugin
  3. Set up a new Application run configuration in Run > Edit Configurations:

(You may need to use the "Modify options" dropdown to unlock options like "Environment variables", "EnvFile", and "Add VM options") Run configuration

  1. Determine your Java home

The above configuration will fail to run properly due to missing JAVA_HOME in the environment. Unfortunately, IntelliJ doesn't propagate this to the running app. To figure out what it is, first run the new configuration, and scroll back up to the top of the output. The first line should look like: Run output Which means that JAVA_HOME should be set to /Library/Java/JavaVirtualMachines/temurin-17.jdk/Contents/Home.

Now you can go back into the run configuration and add it to the "Environment variables" section: Environment variables dialog Run configuration fixed

  1. Run it!

Run Tests in IntelliJ

In order to use the GUI elements to run tests, some runtime configuration template changes are needed:

  1. Set default ScalaTest runtime configuration options in Run > Edit Configurations

First, open the template settings: Runtime configurations dialog Then, go to ScalaTest: ScalaTest template settings Open VM Options (labeled "1" above) and add the JAVA_OPTS from Run Leonardo unit tests, which should end up looking like: ScalaTest template JVM options Open Environment variables (labeled "2" above) and uncheck Include system environment variables: ScalaTest template env vars

  1. Change Scala compiler options in IntelliJ settings

IntelliJ isn't smart enough to set compiler flags differently between the source and test targets. To hack around this, open Settings > Build, Execution, Deployment > Compier > Scala Compiler and select each module. Then uncheck Enable warnings.

NOTE: These changes may revert when you reload the sbt project! Repeat this step to fix tests complaining about warnings that have been turned into errors. If you get errors after compilation but before the tests run, try deleting your test Runtime Configuration, running git clean -xfd -e .idea to clean project files, redoing dependencies/configs, restarting IntelliJ, and redoing the above steps before rerunning tests. Scalac options

  1. Make sure the local MySQL server is running by following the instructions in Run Leonardo unit tests.
  2. Find a test to run and click on the green arrow next to the test to run it normally or using the debugger: Test run pop-up
  3. Run it!

You should see something like Test run results

Connecting to the MySQL database via the CloudSQL proxy

Once you've rendered the configs, started the CloudSQL proxy, and sourced the env vars required to run Leo, you can connect to your database with:

./local/proxies.sh dbconnect

Cleanup

When you're done, stop sbt (e.g. using Ctrl+C) and stop the proxies:

./local/proxies.sh stop

Run Leonardo unit tests

Ensure docker is running. Spin up MySQL locally:

$ ./docker/run-mysql.sh start leonardo

Note, if you see error like

Warning: Using a password on the command line interface can be insecure.
ERROR 2003 (HY000): Can't connect to MySQL server on 'mysql' (113)
Warning: Using a password on the command line interface can be insecure.
ERROR 2003 (HY000): Can't connect to MySQL server on 'mysql' (113)
Warning: Using a password on the command line interface can be insecure.
ERROR 2003 (HY000): Can't connect to MySQL server on 'mysql' (113)

Run docker system prune -a. If the error persists, try restarting your laptop.

Build Leonardo and run all unit tests.

export JAVA_OPTS="-Dheadless=false -Duser.timezone=UTC -Xmx4g -Xss2M -Xms4G"
sbt clean compile "project http" test

You can also run a particular test suite, e.g.

sbt "testOnly *LeoAuthProviderHelperSpec"

or a particular test within a suite, e.g.

sbt "testOnly *LeoPubsubMessageSubscriberSpec -- -z "handle Azure StopRuntimeMessage and stop runtime""

where map is a substring within the test name.

If you made a change to the leonardo Db by adding a changeset xml file, and then adding that file path to the changelog file, you have to set initWithLiquibase = true in the leonardo.conf file for these changes to be reflected in the unit tests. Once youare done testing your changes, make sure to switch it back to initWithLiquibase = false, as this can do some damage if you are running local Leo against Dev!

Once you're done, tear down MySQL.

./docker/run-mysql.sh stop leonardo

Do docker restart leonardo-mysql if you see java.sql.SQLNonTransientConnectionException: Too many connections error

Run scalafmt

Learn more about scalafmt

Building Leonardo docker image

To install git-secrets

brew install git-secrets

To ensure git hooks are run

cp -r hooks/ .git/hooks/
chmod 755 .git/hooks/apply-git-secrets.sh

To build jar and leonardo docker image

./docker/build.sh jar -d build

To build jar and leonardo docker image and push to repos broadinstitute/leonardo tagged with git hash

./docker/build.sh jar -d push

Github actions

Leonardo has custom runners for github actions, as they require more than the default 30GB provisioned by the ubuntu-latest Github runners

There are 3 nodes, you can view them here: https://github.com/DataBiosphere/leonardo/settings/actions/runners. They have 100GB currently. Devops can be contacted to increase the size if needed, but we only need ~60GB at time of writing.