Integrated development environment for NaaVRE.
OS support:
Run these steps once, when setting up the environment.
To integrate the different components of NaaVRE, we use Git submodules:
git clone --recurse-submodules git@github.com:QCDIS/NaaVRE-dev-environment.git
If you get an error:
Cloning into 'NaaVRE-dev-environment'...
git@github.com: Permission denied (publickey).
fatal: Could not read from remote repository.
Please make sure you have the correct access rights
and the repository exists.
then you need to add your ssh key to your GitHub account. Follow the instructions here.
Check out the Git Submodules documentation.
Install Conda from these instructions: https://docs.conda.io/projects/conda/en/latest/user-guide/install/index.html
Setup a new conda environment and install dependencies:
cd NaaVRE-dev-environment
conda env create -n naavre-dev --file environment.yml
conda activate naavre-dev
To install and enable pre-commit hooks, run:
conda activate naavre-dev
pre-commit install
ggshield auth login
During the initial setup, and after updating submodules/VREPaaS-helm-charts
, run:
helm dependency build services/vrepaas/submodules/VREPaaS-helm-charts
To containerize cells from this dev environment, you need to set up a personal GitHub repository. It will be used to commit the cells code and build and publish the container images:
CELL_GITHUB
and CELL_GITHUB_TOKEN
in ./services/naavre/helm/values-integration.yaml and ./services/naavre-dev/helm/values-dev.yaml.The NaaVRE component are deployed by tilt to a minikube cluster. There are two options for running minikube: using a pre-configured NaaVRE-dev-vm, using a self-managed Minikube cluster.
If you are provided with a development VM, follow these instructions: Using the VM (for developers).
We use ingress-dns to access the resources deployed on the minikube cluster. To configure it, start minikube (minikube start
), and follow step 3 section of the minikube ingress-dns setup guide. Choose your operating system.
For Linux, pick the configuration matching your distribution's DNS setup. To find the DNS setup, run head /etc/resolv.conf
:
# Generated by NetworkManager
: follow the Linux OS with Network Manager instructions.# This is /run/systemd/resolve/stub-resolv.conf managed by man:systemd-resolved(8).
: run the following commands (systemd-resolved is not covered by the minikube documentation):
sudo mkdir /etc/systemd/resolved.conf.d
sudo tee /etc/systemd/resolved.conf.d/minikube.conf << EOF
[Resolve]
DNS=$(minikube ip)
Domains=~test
EOF
sudo systemctl restart systemd-resolved
Run these steps every time you want to start the dev environment.
(Skip this step if you are using a Using a pre-configured VM (NaaVRE-dev-vm).)
minikube start --addons=ingress,ingress-dns
# Optional:
minikube dashboard --url
To enable metrics exporting from the ingress contorller modify the deployment and service called ingress-nginx-controller
by patching the deployment and service with the following commands:
kubectl patch deployment ingress-nginx-controller -n ingress-nginx --patch-file services/kube-prometheus-stack/patches/ingress-nginx-controller-deployment-patch.yaml
kubectl patch service ingress-nginx-controller -n ingress-nginx --patch-file services/kube-prometheus-stack/patches/ingress-nginx-controller-service-patch.yaml
These patches are based on the following guide.
To check the metrics are being exported get the nodePort mapped to '10254'. If for example the nodePort is '30361' you can access the metrics from: http://naavre-dev.minikube.test:30361/metrics. The output should be similar to the following:
# HELP go_gc_duration_seconds A summary of the pause duration of garbage collection cycles.
# TYPE go_gc_duration_seconds summary
go_gc_duration_seconds{quantile="0"} 1.971e-05
go_gc_duration_seconds{quantile="0.25"} 2.8325e-05
go_gc_duration_seconds{quantile="0.5"} 5.6258e-05
go_gc_duration_seconds{quantile="0.75"} 0.000102628
go_gc_duration_seconds{quantile="1"} 0.000131488
go_gc_duration_seconds_sum 0.001229649
go_gc_duration_seconds_count 20
# HELP go_goroutines Number of goroutines that currently exist.
# TYPE go_goroutines gauge
go_goroutines 99
# HELP go_info Information about the Go environment.
# TYPE go_info gauge
go_info{version="go1.20.5"} 1
# HELP go_memstats_alloc_bytes Number of bytes allocated and still in use.
# TYPE go_memstats_alloc_bytes gauge
go_memstats_alloc_bytes 6.398712e+06
# HELP go_memstats_alloc_bytes_total Total number of bytes allocated, even if freed.
# TYPE go_memstats_alloc_bytes_total counter
go_memstats_alloc_bytes_total 8.6640184e+07
# HELP go_memstats_buck_hash_sys_bytes Number of bytes used by the profiling bucket hash table.
# TYPE go_memstats_buck_hash_sys_bytes gauge
go_memstats_buck_hash_sys_bytes 1.473744e+06
# HELP go_memstats_frees_total Total number of frees.
# TYPE go_memstats_frees_total counter
go_memstats_frees_total 638149
tilt up
This will open the Tilt dashboard in your browser, and deploy the services needed by NaaVRE.
After starting Tilt, we need to configure the connection between Argo and the VREPaaS. To that end, open a terminal and run token=$(kubectl get secret vre-api.service-account-token -o jsonpath='{.data.token}' | base64 -d); echo "Bearer $token"
. Next, edit services/vrepaas/helm/values.yaml add the output of the previous command (Bearer ey.....
) to global.argo.token
.
After updating the helm values, open the Tilt web interface, wait for the Tiltfile
resource to update, and trigger a manual update on vrepaas-vreapi
.
There three options for starting NaaVRE
Run the NaaVRE dev server locally (e.g. from a separate clone of the repository).
To that end, follow the instructions from https://github.com/QCDIS/NaaVRE/blob/main/README.md#development, creating the file export_VARS
containing:
export API_ENDPOINT="https://naavre-dev.minikube.test/vre-api-test"
export ARGO_WF_SPEC_SERVICEACCOUNT="executor"
export CELL_GITHUB="<the repository you created above>"
export CELL_GITHUB_TOKEN="<the token to access this repo>"
export JUPYTERHUB_SINGLEUSER_APP="jupyter_server.serverapp.ServerApp"
export JUPYTERHUB_USER="user"
export MODULE_MAPPING_URL="https://raw.githubusercontent.com/QCDIS/NaaVRE-conf/main/module_mapping.json"
export NAAVRE_API_TOKEN="token_vreapi"
export PROJ_LIB="/venv/share/proj"
export SEARCH_API_ENDPOINT=""
export SEARCH_API_TOKEN=""
export VLAB_SLUG="n-a-a-vre"
export VRE_API_VERIFY_SSL="false"
(Fill in your values for CELL_GITHUB
and CELL_GITHUB_TOKEN
.)
This option is recommended when developing NaaVRE Jupyter lab extensions, because it provides the fastest reloading on code changes.
Note: when using this option, NaaVRE should be accessed through the direct link. Launching virtual labs from the VREPaaS UI will not work.
Run a dev image of NaaVRE with Tilt, built from ./services/naavre/submodules/NaaVRE. This option deploys NaaVRE as a standalone Jupyter Lab service.
To that end, run the following command (make sure tilt up
is running):
tilt enable n-a-a-vre-dev
This option is recommended when jointly developing NaaVRE and the VREPaaS, if you don’t need to test integration between NaaVRE Jupyter hub or Keycloak.
Note: when using this option, NaaVRE should be accessed through the direct link. Launching virtual labs from the VREPaaS UI will not work.
Similar to option 2, but NaaVRE is deployed through Jupyter Hub.
To that end, run the following command (make sure tilt up
is running):
tilt enable n-a-a-vre-dev hub proxy user-placeholder user-scheduler
This option is recommended to test integration of NaaVRE with Jupyter Hub or Keycloak.
To test the integration of extra services, run:
tilt enable [n-a-a-vre-dev hub proxy user-placeholder user-scheduler] minio traefik square-root-v3 square-root-v2 kube-prometheus-stack-server kube-prometheus-stack-alertmanager
To reset the services, exit tilt and run tilt down
. To fully reset the minikube cluster, run minikube delete
.
The dev services are only accessible locally, using the domain name naavre-dev.minikube.test (provided minikube ingress-dns was setup). That allows us to use insecure credentials to login to the services.
https://naavre-dev.minikube.test/auth/
Account | Username | Password |
---|---|---|
Superuser (master realm) | admin |
admin |
User (vre realm) | user |
user |
https://naavre-dev.minikube.test/argowf/
Login through keycloak.
Account | Token |
---|---|
vre-api service account |
Dynamic |
https://naavre-dev.minikube.test/k8s-secret-creator/1.0.0/ui/
Account | Token |
---|---|
Token authentication | token_ksc |
UI: https://naavre-dev.minikube.test/vreapp
Login through keycloak.
Admin interface: https://naavre-dev.minikube.test/vre-api-test/admin/
Account | Username | Password | Token |
---|---|---|---|
Administrator | admin |
admin |
|
API user | user |
user |
token_vreapi |
https://naavre-dev.minikube.test/n-a-a-vre-dev
No authentication.
This version of NaaVRE runs Jupyter Lab alone (i.e. without Jupyter Hub), and updates automatically when the NaaVRE code is changed. It is suited for testing NaaVRE features, but not for testing integration (in that case, see NaaVRE section below).
https://naavre-dev.minikube.test/n-a-a-vre-integration/
Login through keycloak.
This version of NaaVRE is controlled by Jupyter Hub, and is closer to the actual deployed version. However, it will not update automatically.
To show changes to the NaaVRE component in Tilt:
This is necessary because the Jupyter Lab pod is started dynamically by Jupyter Hub, which prevents Tilt from detecting when it should reload it. It is usually not necessary to reload the NaaVRE/hub and proxy resources, even if Tilt says it has changes.
http://naavre-dev.minikube.test/square-root/
This is a simple example showing how to do canary deployments. To test it enable both versions of the square-root-v2 and square-root-v3 services.
tilt enable square-root-v3 square-root-v2
To see that 50% of the requests are going to each version access open your browser to http://naavre-dev.minikube.test/square-root/4 and send 10 requests (press f5 10 times).
Open the tilt dashboard and check the 'canary-example' resources logs of the services to see that the requests are being distributed between the two versions.
To change the percentage of requests going to each version change the values in the services/canary-example/canary-example-canary.yaml
file in the Ingress look for the nginx.ingress.kubernetes.io/canary-weight
annotation and change the values to the desired
Enable the metrics server:
minikube addons enable metrics-server
UI: https://naavre-dev.minikube.test/grafana/
UI: https://naavre-dev.minikube.test/prometheus/
Account | Username | Password | Token |
---|---|---|---|
Administrator | admin |
prom-operator |
If you have enabled the nginx ingress monitoring you can check the 'ingress-nginx-endpoints' target in the Prometheus dashboard: https://naavre-dev.minikube.test/prometheus/targets
Also, you can import the grafana dashboard to monitor the ingress controller.
You can enable canary deployments with flagger. The instructions are take from by following the instructions here.
Open the values.yaml file in the services/flagger/helm folder and make sure that the meshProvider is set to kubernetes:
meshProvider: kubernetes
Install the podinfo application:
helm upgrade -i podinfo podinfo/podinfo -f services/podinfo/helm/values.yaml --create-namespace -n test
Create the ServiceMonitor, MetricTemplate and the Canary resources by running:
kubectl apply -f services/podinfo/k8s-provider/
Check the podinfo tag version at https://naavre-dev.minikube.test/podinfo/ (should be 6.6.3 )and the metrics at http://naavre-dev.minikube.test/podinfo/metrics
If the canary deployment is deployed correctly you should see in the test namespace three services:
The podinfo service is the main service that will be used to access the podinfo application. The podinfo and podinfo-primary services are using the podinfo-primary pod. The podinfo-canary service is not using any pod.
Flagger will determine if the canary deployment is healthy and if it is it will promote it to the primary deployment. To test it change the image's tag:
helm upgrade -i podinfo podinfo/podinfo -f services/podinfo/helm/values-update.yaml --create-namespace -n test
To watch the carnage in the Canary resources run:
watch kubectl get canaries --all-namespaces
To run everything on one command:
helm uninstall podinfo -n test ; kubectl delete -f services/podinfo/k8s-provider/ ; sleep 20 ; helm upgrade -i podinfo podinfo/podinfo -f services/podinfo/helm/values.yaml --create-namespace -n test && kubectl apply -f services/podinfo/k8s-provider/; sleep 35s ; helm upgrade -i podinfo podinfo/podinfo -f services/podinfo/helm/values-update.yaml --create-namespace -n test
To check the events of the podinfo canary:
kubectl describe canary podinfo -n test
If the canary deployment is successful you should see events si,iar to this:
Events:
Type Reason Age From Message
---- ------ ---- ---- -------
Warning Synced 7m28s flagger podinfo-primary.test not ready: waiting for rollout to finish: observed deployment generation less than desired generation
Normal Synced 6m58s (x2 over 7m28s) flagger all the metrics providers are available!
Normal Synced 6m58s flagger Initialization done! podinfo.test
Normal Synced 6m28s flagger New revision detected! Scaling up podinfo.test
Normal Synced 5m58s flagger Starting canary analysis for podinfo.test
Normal Synced 5m58s flagger Advance podinfo.test canary iteration 1/10
Normal Synced 5m28s flagger Advance podinfo.test canary iteration 2/10
Normal Synced 4m58s flagger Advance podinfo.test canary iteration 3/10
Normal Synced 4m28s flagger Advance podinfo.test canary iteration 4/10
Normal Synced 3m58s flagger Advance podinfo.test canary iteration 5/10
Normal Synced 28s (x6 over 3m28s) flagger (combined from similar events): Copying podinfo.test template spec to podinfo-primary.test
Finally, you can check the version of the podinfo application at https://naavre-dev.minikube.test/podinfo/ (it should be 6.7.0)
Before installing Flagger make sure you have enabled Nginx monitoring according to nginx ingress monitoring.
Open the values.yaml file in the services/flagger/helm folder and make sure that the meshProvider is set to nginx:
meshProvider: nginx
Run evrything on one command:
helm uninstall podinfo -n test ; kubectl delete -f services/podinfo/nginx-provider/ ; sleep 20 ; helm upgrade -i podinfo podinfo/podinfo -f services/podinfo/helm/values.yaml --create-namespace -n test && kubectl apply -f services/podinfo/nginx-provider/; sleep 35; helm upgrade -i podinfo podinfo/podinfo -f services/podinfo/helm/values-update.yaml --create-namespace -n test
if successful you should see the following events:
Events:
Type Reason Age From Message
---- ------ ---- ---- -------
Warning Synced 3m15s flagger podinfo-primary.test not ready: waiting for rollout to finish: observed deployment generation less than desired generation
Normal Synced 3m5s (x2 over 3m15s) flagger all the metrics providers are available!
Normal Synced 3m5s flagger Initialization done! podinfo.test
Normal Synced 2m35s flagger New revision detected! Scaling up podinfo.test
Normal Synced 2m25s flagger Starting canary analysis for podinfo.test
Normal Synced 2m25s flagger Pre-rollout check acceptance-test passed
Normal Synced 2m25s flagger Advance podinfo.test canary weight 5
Warning Synced 105s (x4 over 2m15s) flagger Halt advancement no values found for nginx metric request-success-rate probably podinfo.test is not receiving traffic: running query failed: no values found
Normal Synced 95s flagger Advance podinfo.test canary weight 10
Normal Synced 85s flagger Advance podinfo.test canary weight 15
Normal Synced 75s flagger Advance podinfo.test canary weight 20
Normal Synced 5s (x7 over 65s) flagger (combined from similar events): Copying podinfo.test template spec to podinfo-primary.test
The different components of NaaVRE have their own Git repositories, which are included as submodules of the NaaVRE-dev-environment repository. In the context of the dev repo, these submodules are references to a commit in the component repo.
When in root directory of this repo, git
commands apply to the NaaVRE-dev-environment repo.
When in the submodule directory, git
commands apply to the submodule repo.
For any development task, follow this cycle:
On GitHub, create an issue in the appropriate component repository
Create a branch linked to the issue (e.g. nnn-my-branch
)
Checkout this branch in the submodule:
cd services/component/submodules/COMPONENT
git fetch origin
git checkout nnn-my-branch
Edit code in the submodule while checking the changes with Tilt
Note: During development, running git status
in the NaaVRE-dev-environment root directory will show unstaged changes to the submodule, such as modified: submodule/COMPONENT (untracked content)
or (new commits)
.
Commit and push changes from the submodule directory
On GitHub, create a pull request in the submodule repo
Once it is merged:
If you get an error similar to Failed to pull image "qcdis/n-a-a-vre-laserfarm:v2.0-beta": rpc error: code = Unknown desc = context deadline exceeded
in the continuous-image-puller
logs:
minikube delete
and re-run the startup commands)minikube image load qcdis/n-a-a-vre-laserfarm:v2.0-beta
in your terminalAdmin interface: http://127.0.0.1:9001/
Account | Username | Password | Token |
---|---|---|---|
Administrator | admin |
password |
Before installing Velero, you need to create an access key and bucket. To create the access kay and bucket access the Minio UI (http://127.0.0.1:9001/).
Create an access key with the following id and secret:
aws_access_key_id = minio
aws_secret_access_key = minio123
After creating the key you need to specify its Access Key Policy to allow Velero to access the bucket naavre-dev.minikube.test
:
{
"Version": "2012-10-17",
"Statement": [
{
"Effect": "Allow",
"Action": [
"s3:GetBucketLocation",
"s3:ListBucket",
"s3:ListBucketMultipartUploads"
],
"Resource": [
"arn:aws:s3:::naavre-dev.minikube.test"
]
},
{
"Effect": "Allow",
"Action": [
"s3:AbortMultipartUpload",
"s3:DeleteObject",
"s3:GetObject",
"s3:ListMultipartUploadParts",
"s3:PutObject"
],
"Resource": [
"arn:aws:s3:::naavre-dev.minikube.test/*"
]
}
]
}
Create a bucket named naavre-dev.minikube.test
and add the following access policy:
{
"Version": "2012-10-17",
"Statement": [
{
"Effect": "Allow",
"Principal": {
"AWS": [
"minio"
]
},
"Action": [
"s3:GetBucketLocation",
"s3:ListBucket",
"s3:ListBucketMultipartUploads"
],
"Resource": [
"arn:aws:s3:::naavre-dev.minikube.test"
]
},
{
"Effect": "Allow",
"Principal": {
"AWS": [
"minio"
]
},
"Action": [
"s3:DeleteObject",
"s3:GetObject",
"s3:ListMultipartUploadParts",
"s3:PutObject",
"s3:AbortMultipartUpload"
],
"Resource": [
"arn:aws:s3:::naavre-dev.minikube.test/*"
]
}
]
}
Follow the instructions here to install Velero.
velero install --provider aws --use-node-agent --plugins velero/velero-plugin-for-aws:v1.2.1 \
--bucket naavre-dev.minikube.test --secret-file ./services/velero/credentials-velero --backup-location-config \
region=minio,s3ForcePathStyle="true",s3Url=http://host.minikube.internal:9000
To backup a namespace:
velero backup create default-ns-backup --default-volumes-to-fs-backup --include-namespaces default --wait
You must apply an annotation to every pod which contains volumes for Velero to use FSB for the backup. For keycloak, we have annotated postgresql in helm_config/keycloak/values.yaml:
postgresql:
enabled: true
auth:
postgresPassword: fake_postgres_password
password: fake_password
annotations:
backup.velero.io/backup-volumes: pvc-volume,emptydir-volume
of
singleuser:
extraAnnotations:
backup.velero.io/backup-volumes: pvc-volume,emptydir-volume
cmd: ['/usr/local/bin/start-jupyter-venv.sh']
To restore a namespace:
velero restore create --from-backup default-ns-backup
Find the container running Minikube:
docker ps | grep k8s-provider-minikube
Access the container running Minikube
docker exec -it <container-id> /bin/bash
Delete the Keycloak postgresql data directory:
rm -r /tmp/hostpath-provisioner/default/data-keycloak-postgresql-0/
Go to https://naavre-dev.minikube.test/auth/. If the DB is missing, you won't be able to log in or you will get an error message:
Unexpected Application Error!
Network response was not OK.
NetworkError@https://naavre-dev.minikube.test/auth/resources/9qowb/admin/keycloak.v2/assets/index-d73da1a7.js:67:43535
fetchWithError@https://naavre-dev.minikube.test/auth/resources/9qowb/admin/keycloak.v2/assets/index-d73da1a7.js:67:43710
Restore the namespace:
velero restore create --from-backup default-ns-backup --wait
Try again to log in to Keycloak.