Closed artursouza closed 10 months ago
Listing steps performed as a log of what was done and to serve as reference/documentation in case this needs to be redone.
aks-longhaul-release
and aks-longhaul-weekly
subscription
# From https://github.com/dapr/test-infra/pull/203 and https://github.com/dapr/test-infra/blob/master/README.md
export SUBSCRIPTION_TO_BE_USED=INSERT_SUBSCRIPTION_UUID_HERE
export release_or_weekly='release' # use 'weekly' for weekly
export resourceGroup="aks-longhaul-${release_or_weekly}"
export DAPR_VERSION_TO_INSTALL='1.12.0'
export location=eastus
export clusterName=$resourceGroup
export MONITORING_NS=dapr-monitoring
# First, loging on Dapr OSS subscription on your default browser
# Then, login on az CLI
az account clear && az login --output=none && az account set --subscription ${SUBSCRIPTION_TO_BE_USED}
az group create --name ${resourceGroup} --location ${location}
az deployment group create \
--resource-group ${resourceGroup} \
--template-file ./deploy/aks/main.bicep \
--parameters deploy/aks/parameters-longhaul-${release_or_weekly}.json
# We want to manually control Dapr setup, so let's remove the Azure-controlled Dapr ext.
az k8s-extension delete --yes \
--resource-group ${resourceGroup} \
--cluster-name ${clusterName} \
--cluster-type managedClusters \
--name ${clusterName}-dapr-ext
az aks get-credentials --admin --name ${clusterName} --resource-group ${resourceGroup}
# Just for good measure...
dapr uninstall -k
# Now to the helm chart upgrade
helm repo update && \
helm upgrade --install dapr dapr/dapr \
--version=${DAPR_VERSION_TO_INSTALL} \
--namespace dapr-system \
--create-namespace \
--wait
for app in "feed-generator-app" "hashtag-actor-app" "hashtag-counter-app" "message-analyzer-app" "pubsub-workflow-app" "snapshot-app" "validation-worker-app" "workflow-gen-app"; do
kubectl rollout restart deploy/${app} -n longhaul-test || break
done
# From https://github.com/dapr/test-infra/blob/master/.github/workflows/dapr-longhaul-weekly.yml
kubectl get namespace | grep ${MONITORING_NS} || kubectl create namespace ${MONITORING_NS}
# Following https://docs.dapr.io/operations/observability/metrics/prometheus/#setup-prometheus-on-kubernetes
helm repo add prometheus-community https://prometheus-community.github.io/helm-charts && \
helm repo update && \
helm install dapr-prom prometheus-community/prometheus \
--namespace dapr-monitoring \
--create-namespace \
--wait
This is being bypassed as we fixed dashboard code in dapr/dapr#7121. There is no need to install custom prometheus setting. Rejoice.
# https://docs.dapr.io/operations/observability/metrics/grafana/#setup-on-kubernetes
helm repo add grafana https://grafana.github.io/helm-charts && \
helm repo update && \
helm upgrade --install grafana grafana/grafana \
--values ./grafana-config/values.yaml \
--namespace ${MONITORING_NS} \
--create-namespace \
--wait && \
kubectl get pods -n ${MONITORING_NS}
Steps here are basically just following the steps described on https://docs.dapr.io/operations/observability/metrics/grafana/#configure-prometheus-as-data-source
kubectl get secret --namespace dapr-monitoring grafana -o jsonpath={.data.admin-password} | base64 --decode | clip.exe
kubectl port-forward svc/grafana 8080:80 --namespace dapr-monitoring
Just follow https://docs.dapr.io/operations/observability/metrics/grafana/#configure-prometheus-as-data-source
Use the code from dapr/dapr#7121 or, if it is merged, from https://github.com/dapr/dapr/blob/master/grafana/
Remember: cat ... | clip.exe
or cat ... | pbcopy
is your friend.
Are they created by default by the bicep template? - NO
Check curent permissions. - They are too wide.
Check workflows for how they log in currently -- we want to avoid changing them much
az login --service-principal -u ${{ secrets.AZURE_LOGIN_USER }} -p ${{ secrets.AZURE_LOGIN_PASS }} --tenant ${{ secrets.AZURE_TENANT }} --output none
In summary, we want a subscription-wide Principal that grants the corect role in/for those two clusters only.
test-infra-github-actions-longhaul-cluster-admin
Azure Kubernetes Service Cluster User Role
az aks get-cretentials
without the --admin
flag. If we used that flag in out commands we would need the Azure Kubernetes Service Cluster Admin Role
role that grants access to the Microsoft.ContainerService/managedClusters/listClusterAdminCredential/action
API.aks-longhaul-weekly
az aks get-cretentials
to this k8s clusteraks-longhaul-release
az aks get-cretentials
to this k8s clusterAZURE_TENANT
, AZURE_LOGIN_USER
, AZURE_LOGIN_PASS
Updated secrets AZURE_TENANT
, AZURE_LOGIN_USER
, AZURE_LOGIN_PASS
with values that point to service principal credential created above on 2023-11-06 17:10 PST.
dapr-longhaul-release
dapr-longhaul-weekly
This is tracked separately, in issue #210.
Added screenshot of the current status of the dashboard to #210.
I am closing this issue as the transition was done: the clusters are running in the OSS subscription and GitHub workflows trigger actions on these clusters. The removal of the old longhaul clusters is tracked separately on #210.
Longhaul tests are still running in a subscription that is only accessible by Msft employees. Both, release and nightly environments, should be moved to the same Azure subscription we use for E2E tests.
Child of: https://github.com/dapr/test-infra/issues/156