dapr / test-infra

Test apps and tools for Dapr
Apache License 2.0
14 stars 23 forks source link

Move longhaul to the same Azure subscription we use for E2E tests #167

Closed artursouza closed 10 months ago

artursouza commented 1 year ago

Longhaul tests are still running in a subscription that is only accessible by Msft employees. Both, release and nightly environments, should be moved to the same Azure subscription we use for E2E tests.

Child of: https://github.com/dapr/test-infra/issues/156

tmacam commented 10 months ago

Listing steps performed as a log of what was done and to serve as reference/documentation in case this needs to be redone.

Intro

Setup environment and credentials


# From https://github.com/dapr/test-infra/pull/203 and https://github.com/dapr/test-infra/blob/master/README.md

export SUBSCRIPTION_TO_BE_USED=INSERT_SUBSCRIPTION_UUID_HERE
export release_or_weekly='release' # use 'weekly' for weekly
export resourceGroup="aks-longhaul-${release_or_weekly}"
export DAPR_VERSION_TO_INSTALL='1.12.0'
export location=eastus
export clusterName=$resourceGroup
export MONITORING_NS=dapr-monitoring

Login to OSS subs

# First, loging on Dapr OSS subscription on your default browser

# Then, login on az CLI
az account clear && az login --output=none && az account set --subscription ${SUBSCRIPTION_TO_BE_USED}

Create new subscriptions

az group create --name ${resourceGroup} --location ${location}

Deploy clusters

az deployment group create \
    --resource-group ${resourceGroup} \
    --template-file ./deploy/aks/main.bicep \
    --parameters deploy/aks/parameters-longhaul-${release_or_weekly}.json

Remove Dapr AKS extension

# We want to manually control Dapr setup, so let's remove the Azure-controlled Dapr ext.
az k8s-extension delete --yes \
    --resource-group ${resourceGroup} \
    --cluster-name ${clusterName} \
    --cluster-type managedClusters \
    --name ${clusterName}-dapr-ext

Get cluster credentials

az aks get-credentials --admin --name ${clusterName} --resource-group ${resourceGroup}

Install latest stable on both clusters through helm

# Just for good measure...
dapr uninstall -k

# Now to the helm chart upgrade
helm repo update && \
helm upgrade --install dapr dapr/dapr \
    --version=${DAPR_VERSION_TO_INSTALL} \
    --namespace dapr-system \
    --create-namespace \
    --wait

Bounce the apps (we just re-installed Dapr)

for app in "feed-generator-app" "hashtag-actor-app" "hashtag-counter-app" "message-analyzer-app" "pubsub-workflow-app" "snapshot-app" "validation-worker-app" "workflow-gen-app"; do
    kubectl rollout restart deploy/${app} -n longhaul-test || break
done 

Setup monitoring namespace (next steps require this)

# From https://github.com/dapr/test-infra/blob/master/.github/workflows/dapr-longhaul-weekly.yml
kubectl get namespace | grep ${MONITORING_NS} || kubectl create namespace ${MONITORING_NS}

Install Prometheus through helm chart

# Following https://docs.dapr.io/operations/observability/metrics/prometheus/#setup-prometheus-on-kubernetes

helm repo add prometheus-community https://prometheus-community.github.io/helm-charts && \
helm repo update && \
helm install dapr-prom prometheus-community/prometheus \
    --namespace dapr-monitoring \
    --create-namespace \
    --wait

Install Prometheus custom setting

This is being bypassed as we fixed dashboard code in dapr/dapr#7121. There is no need to install custom prometheus setting. Rejoice.

Install Grafana through helm chart

    # https://docs.dapr.io/operations/observability/metrics/grafana/#setup-on-kubernetes
    helm repo add grafana https://grafana.github.io/helm-charts && \
    helm repo update && \
    helm upgrade --install grafana grafana/grafana \
        --values ./grafana-config/values.yaml \
        --namespace ${MONITORING_NS} \
        --create-namespace \
        --wait && \
    kubectl get pods -n ${MONITORING_NS}

Configure grafana

Steps here are basically just following the steps described on https://docs.dapr.io/operations/observability/metrics/grafana/#configure-prometheus-as-data-source

Log in to grafana

kubectl get secret --namespace dapr-monitoring grafana -o jsonpath={.data.admin-password} | base64 --decode | clip.exe
kubectl port-forward svc/grafana 8080:80 --namespace dapr-monitoring

Register prometheus datasource

Just follow https://docs.dapr.io/operations/observability/metrics/grafana/#configure-prometheus-as-data-source

Import dashboards (from #7121)

Use the code from dapr/dapr#7121 or, if it is merged, from https://github.com/dapr/dapr/blob/master/grafana/

Remember: cat ... | clip.exe or cat ... | pbcopy is your friend.

Create credentials for both clusters

Initial checks:

Create service Principal

Role: Azure Kubernetes Service Cluster User Role

Update release and weekly workflow to work on new cluster

Test by creating credentials in personal fork (tmacam/dapr-test-infra)

Update secrets on GH with new credentials

Updated secrets AZURE_TENANT, AZURE_LOGIN_USER, AZURE_LOGIN_PASS with values that point to service principal credential created above on 2023-11-06 17:10 PST.

Verify clusters and workflows are working as expected

Remove clusters in internal subscription

This is tracked separately, in issue #210.

tmacam commented 10 months ago

Added screenshot of the current status of the dashboard to #210.

I am closing this issue as the transition was done: the clusters are running in the OSS subscription and GitHub workflows trigger actions on these clusters. The removal of the old longhaul clusters is tracked separately on #210.