Open yurii-kryvosheia opened 1 month ago
Please check logs for defender container in browser pod, browser pods should be timed out in 5min by default, if not there should be error messages in defender container logs.
пт, 9 авг. 2024 г., 10:53 yurii-kryvosheia @.***>:
We are noticing numerous Chrome browser pods that are left running without termination.
Steps to reproduce:
- Run tests
- Wait for the browser pods to appear
- Restart the moon pod using the command kubectl -n moon rollout restart deployment moon or delete the pods.
- Stop tests
If you repeat these steps from the beginning, you will see an increasing number of idle, unterminated pods that need to be manually deleted. These pods are visible in the UI.
— Reply to this email directly, view it on GitHub https://github.com/aerokube/moon/issues/429, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAKY23IVAM6BKZLH2L26UADZQRYOVAVCNFSM6AAAAABMH6PSDWVHI2DSMVQWIX3LMV43ASLTON2WKOZSGQ2TOMZTGY2DQNI . You are receiving this because you are subscribed to this thread.Message ID: @.***>
@aandryashin What if we don't have "defender" container? 🤔 Just a "ca-certs" and a "browser".
That is not possible, every browser pod has defender container running.
пт, 9 авг. 2024 г., 15:57 yurii-kryvosheia @.***>:
@aandryashin https://github.com/aandryashin What if we don't have "defender" container? 🤔 Just a "ca-certs" and a "browser".
— Reply to this email directly, view it on GitHub https://github.com/aerokube/moon/issues/429#issuecomment-2277886611, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAKY23LGJCHHA57SQJVFHRTZQS4DFAVCNFSM6AAAAABMH6PSDWVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDENZXHA4DMNRRGE . You are receiving this because you were mentioned.Message ID: @.***>
Here's the values for the moon2 chart, version 2.7.1. Nothing fancy.
licenseSecretName: moon-license
deployment:
experimentalUI: true
nodeSelector:
scope: "moon"
tolerations:
- key: "Moon"
operator: "Equal"
value: "true"
effect: "NoSchedule"
customIngress:
enabled: true
annotations:
alb.ingress.kubernetes.io/target-type: ip
alb.ingress.kubernetes.io/healthcheck-path: /ui/
alb.ingress.kubernetes.io/success-codes: 200,201,404
alb.ingress.kubernetes.io/listen-ports: '[{"HTTPS":443}]'
ingressClassName: "ops"
host: "moon.domain.com"
paths:
- path: /wd/hub
port: 4444
- path: /playwright
port: 4444
- path: /devtools
port: 4444
- path: /api
port: 9090
- path: /ui
port: 9090
ingress:
enabled: false
browsers:
default:
annotations:
karpenter.sh/do-not-disrupt: "true"
nodeSelector:
scope: "moon"
tolerations:
- key: "Moon"
operator: "Equal"
value: "true"
effect: "NoSchedule"
And that's how browser pod looks like
What could possibly be the issue?
Ok, i see, you are using playwright. Playwright pods are terminated when client connection to moon closed, moon has graceful period (360 seconds by default) to wait while all client connections are closed, if you playwright tests are longer you have to increase moons graceful shutdown period.
пт, 9 авг. 2024 г., 16:39 yurii-kryvosheia @.***>:
Here's the values for the moon2 chart, version 2.7.1. Nothing fancy.
licenseSecretName: moon-license deployment: experimentalUI: true nodeSelector: scope: "moon" tolerations: - key: "Moon" operator: "Equal" value: "true" effect: "NoSchedule" customIngress: enabled: true annotations: alb.ingress.kubernetes.io/target-type: ip alb.ingress.kubernetes.io/healthcheck-path: /ui/ alb.ingress.kubernetes.io/success-codes: 200,201,404 alb.ingress.kubernetes.io/listen-ports: '[{"HTTPS":443}]' ingressClassName: "ops" host: "moon.domain.com" paths: - path: /wd/hub port: 4444 - path: /playwright port: 4444 - path: /devtools port: 4444 - path: /api port: 9090 - path: /ui port: 9090 ingress: enabled: false browsers: default: annotations: karpenter.sh/do-not-disrupt: "true" nodeSelector: scope: "moon" tolerations: - key: "Moon" operator: "Equal" value: "true" effect: "NoSchedule"
And that's how browser pod looks like image.png (view on web) https://github.com/user-attachments/assets/38b03bbe-5411-4557-ab0d-c13f79a60c85 What could possibly be the issue?
— Reply to this email directly, view it on GitHub https://github.com/aerokube/moon/issues/429#issuecomment-2277969557, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAKY23NAAQELKMVXBXZUZTDZQTBCVAVCNFSM6AAAAABMH6PSDWVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDENZXHE3DSNJVG4 . You are receiving this because you were mentioned.Message ID: @.***>
Playwright pods are terminated when client connection to moon closed
Based on your response, if I terminate the moon pods, the connection is closed so browser pods should terminate. However, this is not happening.
No, moon terninates playwright pod when client connection is closed when test done, otherwise it waits graceful period to get chance test completed. In case when moon will close connections and terminate pods you will get test failures...
пт, 9 авг. 2024 г., 17:44 yurii-kryvosheia @.***>:
Playwright pods are terminated when client connection to moon closed
Based on your response, if I terminate the moon pods, the connection is closed so browser pods should terminate. However, this is not happening.
— Reply to this email directly, view it on GitHub https://github.com/aerokube/moon/issues/429#issuecomment-2278113908, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAKY23J565WZRN3FZ2BKJP3ZQTIUPAVCNFSM6AAAAABMH6PSDWVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDENZYGEYTGOJQHA . You are receiving this because you were mentioned.Message ID: @.***>
The whole point of the issue is that moon doesn't terminate the browser pods whose tests are completed even after 5 minutes timeout.
@yurii-kryvosheia you don't get what we are trying to explain. In Selenium every command is a separate HTTP request, so the only way to detect browser pods idle for a long time is measuring time between HTTP request. This is where timeout comes into play. In Playwright \ Puppeteer things are working completely differently. All commands are being transferred through one permanent web-socket connection. This connection is closed when Playwright tests finish. Moon is watching such connections and automatically deletes such pods. Two possible reasons why this could not happen are:
1) Connection is not really closed, because of frozen CI builds. Just check that node.js processes do not remain alive. 2) Connection is closed on load balancer (so test considers everything went good) but not closed between load balancer and Moon. This is rare, but also possible. 3) There is an issue deleting browser pods via Kubernetes API. To check this version - try to filter Moon logs by pod id:
$ kubectl logs -lapp=moon -c moon -n moon 2>&1 | grep playwright-chrome-XXXX
We are noticing numerous Chrome browser pods that are left running without termination.
Steps to reproduce:
kubectl -n moon rollout restart deployment moon
or delete the pods.If you repeat these steps from the beginning, you will see an increasing number of idle, unterminated pods that need to be manually deleted. These pods are visible in the UI.