aerokube / moon

Browser automation solution for Kubernetes and Openshift supporting Selenium, Playwright, Puppeteer and Cypress
http://aerokube.com/moon/latest
Apache License 2.0
221 stars 20 forks source link

Browser pods not termitating #429

Open yurii-kryvosheia opened 1 month ago

yurii-kryvosheia commented 1 month ago

We are noticing numerous Chrome browser pods that are left running without termination.

Steps to reproduce:

  1. Run tests
  2. Wait for the browser pods to appear
  3. Restart the moon pod using the command kubectl -n moon rollout restart deployment moon or delete the pods.
  4. Stop tests

If you repeat these steps from the beginning, you will see an increasing number of idle, unterminated pods that need to be manually deleted. These pods are visible in the UI.

aandryashin commented 1 month ago

Please check logs for defender container in browser pod, browser pods should be timed out in 5min by default, if not there should be error messages in defender container logs.

пт, 9 авг. 2024 г., 10:53 yurii-kryvosheia @.***>:

We are noticing numerous Chrome browser pods that are left running without termination.

Steps to reproduce:

  1. Run tests
  2. Wait for the browser pods to appear
  3. Restart the moon pod using the command kubectl -n moon rollout restart deployment moon or delete the pods.
  4. Stop tests

If you repeat these steps from the beginning, you will see an increasing number of idle, unterminated pods that need to be manually deleted. These pods are visible in the UI.

— Reply to this email directly, view it on GitHub https://github.com/aerokube/moon/issues/429, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAKY23IVAM6BKZLH2L26UADZQRYOVAVCNFSM6AAAAABMH6PSDWVHI2DSMVQWIX3LMV43ASLTON2WKOZSGQ2TOMZTGY2DQNI . You are receiving this because you are subscribed to this thread.Message ID: @.***>

yurii-kryvosheia commented 1 month ago

@aandryashin What if we don't have "defender" container? 🤔 Just a "ca-certs" and a "browser".

aandryashin commented 1 month ago

That is not possible, every browser pod has defender container running.

пт, 9 авг. 2024 г., 15:57 yurii-kryvosheia @.***>:

@aandryashin https://github.com/aandryashin What if we don't have "defender" container? 🤔 Just a "ca-certs" and a "browser".

— Reply to this email directly, view it on GitHub https://github.com/aerokube/moon/issues/429#issuecomment-2277886611, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAKY23LGJCHHA57SQJVFHRTZQS4DFAVCNFSM6AAAAABMH6PSDWVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDENZXHA4DMNRRGE . You are receiving this because you were mentioned.Message ID: @.***>

yurii-kryvosheia commented 1 month ago

Here's the values for the moon2 chart, version 2.7.1. Nothing fancy.

    licenseSecretName: moon-license
    deployment:
      experimentalUI: true
      nodeSelector:
        scope: "moon"
      tolerations:
        - key: "Moon"
          operator: "Equal"
          value: "true"
          effect: "NoSchedule"

    customIngress:
      enabled: true
      annotations:
        alb.ingress.kubernetes.io/target-type: ip
        alb.ingress.kubernetes.io/healthcheck-path: /ui/
        alb.ingress.kubernetes.io/success-codes: 200,201,404
        alb.ingress.kubernetes.io/listen-ports: '[{"HTTPS":443}]'
      ingressClassName: "ops"
      host: "moon.domain.com"
      paths:
        - path: /wd/hub
          port: 4444
        - path: /playwright
          port: 4444
        - path: /devtools
          port: 4444
        - path: /api
          port: 9090
        - path: /ui
          port: 9090
    ingress:
      enabled: false

    browsers:
      default:
        annotations:
          karpenter.sh/do-not-disrupt: "true"
        nodeSelector:
          scope: "moon"
        tolerations:
          - key: "Moon"
            operator: "Equal"
            value: "true"
            effect: "NoSchedule"

And that's how browser pod looks like

image

What could possibly be the issue?

aandryashin commented 1 month ago

Ok, i see, you are using playwright. Playwright pods are terminated when client connection to moon closed, moon has graceful period (360 seconds by default) to wait while all client connections are closed, if you playwright tests are longer you have to increase moons graceful shutdown period.

пт, 9 авг. 2024 г., 16:39 yurii-kryvosheia @.***>:

Here's the values for the moon2 chart, version 2.7.1. Nothing fancy.

licenseSecretName: moon-license
deployment:
  experimentalUI: true
  nodeSelector:
    scope: "moon"
  tolerations:
    - key: "Moon"
      operator: "Equal"
      value: "true"
      effect: "NoSchedule"

customIngress:
  enabled: true
  annotations:
    alb.ingress.kubernetes.io/target-type: ip
    alb.ingress.kubernetes.io/healthcheck-path: /ui/
    alb.ingress.kubernetes.io/success-codes: 200,201,404
    alb.ingress.kubernetes.io/listen-ports: '[{"HTTPS":443}]'
  ingressClassName: "ops"
  host: "moon.domain.com"
  paths:
    - path: /wd/hub
      port: 4444
    - path: /playwright
      port: 4444
    - path: /devtools
      port: 4444
    - path: /api
      port: 9090
    - path: /ui
      port: 9090
ingress:
  enabled: false

browsers:
  default:
    annotations:
      karpenter.sh/do-not-disrupt: "true"
    nodeSelector:
      scope: "moon"
    tolerations:
      - key: "Moon"
        operator: "Equal"
        value: "true"
        effect: "NoSchedule"

And that's how browser pod looks like image.png (view on web) https://github.com/user-attachments/assets/38b03bbe-5411-4557-ab0d-c13f79a60c85 What could possibly be the issue?

— Reply to this email directly, view it on GitHub https://github.com/aerokube/moon/issues/429#issuecomment-2277969557, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAKY23NAAQELKMVXBXZUZTDZQTBCVAVCNFSM6AAAAABMH6PSDWVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDENZXHE3DSNJVG4 . You are receiving this because you were mentioned.Message ID: @.***>

yurii-kryvosheia commented 1 month ago

Playwright pods are terminated when client connection to moon closed

Based on your response, if I terminate the moon pods, the connection is closed so browser pods should terminate. However, this is not happening.

aandryashin commented 1 month ago

No, moon terninates playwright pod when client connection is closed when test done, otherwise it waits graceful period to get chance test completed. In case when moon will close connections and terminate pods you will get test failures...

пт, 9 авг. 2024 г., 17:44 yurii-kryvosheia @.***>:

Playwright pods are terminated when client connection to moon closed

Based on your response, if I terminate the moon pods, the connection is closed so browser pods should terminate. However, this is not happening.

— Reply to this email directly, view it on GitHub https://github.com/aerokube/moon/issues/429#issuecomment-2278113908, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAKY23J565WZRN3FZ2BKJP3ZQTIUPAVCNFSM6AAAAABMH6PSDWVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDENZYGEYTGOJQHA . You are receiving this because you were mentioned.Message ID: @.***>

yurii-kryvosheia commented 1 month ago

The whole point of the issue is that moon doesn't terminate the browser pods whose tests are completed even after 5 minutes timeout.

vania-pooh commented 1 month ago

@yurii-kryvosheia you don't get what we are trying to explain. In Selenium every command is a separate HTTP request, so the only way to detect browser pods idle for a long time is measuring time between HTTP request. This is where timeout comes into play. In Playwright \ Puppeteer things are working completely differently. All commands are being transferred through one permanent web-socket connection. This connection is closed when Playwright tests finish. Moon is watching such connections and automatically deletes such pods. Two possible reasons why this could not happen are:

1) Connection is not really closed, because of frozen CI builds. Just check that node.js processes do not remain alive. 2) Connection is closed on load balancer (so test considers everything went good) but not closed between load balancer and Moon. This is rare, but also possible. 3) There is an issue deleting browser pods via Kubernetes API. To check this version - try to filter Moon logs by pod id:

$ kubectl logs -lapp=moon -c moon -n moon 2>&1 | grep playwright-chrome-XXXX