alcounit / selenosis

Scalable, stateless selenium hub for Kubernetes cluster
Apache License 2.0
81 stars 24 forks source link

Some browser pods stay forever #35

Closed shlomitsur closed 3 years ago

shlomitsur commented 3 years ago

Hello, There are some chrome pods that stay after days. image

$ kubectl describe pod chrome-89-0-11bc53f9-89c3-45b2-be70-db310202f625 -n selenosis
Name:         chrome-89-0-11bc53f9-89c3-45b2-be70-db310202f625
Namespace:    selenosis
Priority:     0
Node:         ip-10-10-28-62.us-west-2.compute.internal/10.10.28.62
Start Time:   Fri, 16 Apr 2021 14:01:19 +0300
Labels:       selenosis.app.type=browser
              session=chrome-89-0-11bc53f9-89c3-45b2-be70-db310202f625
              type=browser
Annotations:  capabilities: {"browserName":"chrome","browserVersion":"89.0","testName":""}
              kubernetes.io/psp: eks.privileged
Status:       Failed
IP:           10.10.24.164
IPs:
  IP:  10.10.24.164
Containers:
  browser:
    Container ID:   docker://78ae6f7b30ce3e2b5879b58faff4e4f8a3897dbc4874d4bf152f7c646bdd3488
    Image:          543052680787.dkr.ecr.us-west-2.amazonaws.com/chrome:89.0
    Image ID:       docker-pullable://543052680787.dkr.ecr.us-west-2.amazonaws.com/chrome@sha256:0d205b40d563e4c851a90baf8e2bcc44a5dd691f30c3716e7da0741ef5f412dd
    Ports:          5900/TCP, 4444/TCP
    Host Ports:     0/TCP, 0/TCP
    State:          Terminated
      Reason:       Error
      Exit Code:    143
      Started:      Fri, 16 Apr 2021 14:01:22 +0300
      Finished:     Fri, 16 Apr 2021 14:01:36 +0300
    Ready:          False
    Restart Count:  0
    Limits:
      cpu:     500m
      memory:  1Gi
    Requests:
      cpu:        500m
      memory:     1Gi
    Environment:  <none>
    Mounts:
      /dev/shm from dshm (rw)
      /var/run/secrets/kubernetes.io/serviceaccount from default-token-tcqrc (ro)
  seleniferous:
    Container ID:  docker://6b4f7d89834e78ced028c17366d40e8c7b83c942b99b9b2d3a6be9c505cda66a
    Image:         543052680787.dkr.ecr.us-west-2.amazonaws.com/seleniferous:v0.0.3-develop
    Image ID:      docker-pullable://543052680787.dkr.ecr.us-west-2.amazonaws.com/seleniferous@sha256:eba605c5311f37766fc37435b94f24b4bf5dbc05ffe6b69a5054d31447911638
    Port:          4445/TCP
    Host Port:     0/TCP
    Command:
      /seleniferous
      --listhen-port
      4445
      --proxy-default-path
      /session
      --idle-timeout
      5m0s
      --namespace
      selenosis
    State:          Terminated
      Reason:       Completed
      Exit Code:    0
      Started:      Fri, 16 Apr 2021 14:01:24 +0300
      Finished:     Fri, 16 Apr 2021 14:01:37 +0300
    Ready:          False
    Restart Count:  0
    Environment:    <none>
    Mounts:
      /var/run/secrets/kubernetes.io/serviceaccount from default-token-tcqrc (ro)
Conditions:
  Type              Status
  Initialized       True
  Ready             False
  ContainersReady   False
  PodScheduled      True
Volumes:
  dshm:
    Type:       EmptyDir (a temporary directory that shares a pod's lifetime)
    Medium:     Memory
    SizeLimit:  <unset>
  default-token-tcqrc:
    Type:        Secret (a volume populated by a Secret)
    SecretName:  default-token-tcqrc
    Optional:    false
QoS Class:       Burstable
Node-Selectors:  <none>
Tolerations:     node.kubernetes.io/not-ready:NoExecute op=Exists for 300s
                 node.kubernetes.io/unreachable:NoExecute op=Exists for 300s
Events:          <none>
$ kubectl logs -f chrome-89-0-11bc53f9-89c3-45b2-be70-db310202f625 -n selenosis browser
20
2021/04/16 11:01:22 [INIT] [Listening on :7070]
Logging to: /dev/null
Waiting X server...
....
....
Waiting X server...
Starting ChromeDriver 89.0.4389.23 (61b08ee2c50024bab004e48d2b1b083cdbdac579-refs/branch-heads/4389@{#294}) on port 4444
All remote connections are allowed. Use an allowlist instead!
Please see https://chromedriver.chromium.org/security-considerations for suggestions on keeping ChromeDriver safe.
ChromeDriver was started successfully.
$ kubectl logs -f chrome-89-0-11bc53f9-89c3-45b2-be70-db310202f625 -n selenosis seleniferous
{"level":"info","msg":"starting seleniferous v0.0.3-develop","time":"2021-04-16T11:01:24Z"}
{"level":"info","msg":"kubernetes client created","time":"2021-04-16T11:01:24Z"}
{"level":"info","msg":"new session request","request":"POST /wd/hub/session","request_by":"selenosis-676c87fb57-mt4w2","request_id":"5062016a-6538-460b-be5f-560078f59a92","time":"2021-04-16T11:01:33Z"}
{"level":"info","msg":"new session request completed: 17bdb29166f2c81cddb148ca7b7941a1","request":"POST /wd/hub/session","request_by":"selenosis-676c87fb57-mt4w2","request_id":"5062016a-6538-460b-be5f-560078f59a92","time":"2021-04-16T11:01:34Z"}
{"level":"info","msg":"proxy session","request":"POST /wd/hub/session/chrome-89-0-11bc53f9-89c3-45b2-be70-db310202f625/timeouts/implicit_wait","request_by":"selenosis-676c87fb57-g722l","request_id":"270bde6c-28ec-4dd8-ba39-5e9f6a130438","time":"2021-04-16T11:01:34Z"}
{"level":"info","msg":"proxy session","request":"POST /wd/hub/session/chrome-89-0-11bc53f9-89c3-45b2-be70-db310202f625/url","request_by":"selenosis-676c87fb57-g722l","request_id":"5a5538a9-a6b5-4cad-b8eb-369093cb4fd3","time":"2021-04-16T11:01:34Z"}
{"level":"warning","msg":"stopping seleniferous","time":"2021-04-16T11:01:36Z"}

Thank you

alcounit commented 3 years ago

Hi @shlomitsur thanks for the feedback, will try to figure out where the problem is

alcounit commented 3 years ago

Looks like container with browser has some problems Exit Code: 143 (Exit Code 143: Indicates failure as container received SIGTERM). I've added correct handling of stop signal to seleniferous container, can you please update seleniferous version to alcounit/seleniferous:v1.0.0

shlomitsur commented 3 years ago

sure thanks @alcounit I'll deploy & report back

shlomitsur commented 3 years ago

Btw Chrome 90 is out with some interesting cpu improvements: "For Mac, we’re seeing up to 65% improvement in Energy Impact when active tabs are prioritized over tabs you aren’t using. This means up to 35% reduction in CPU usage and up to 1.25 more hours of battery life, with similar results on Windows, Chrome OS and Linux. And on Android, Chrome starts up 13% faster even with lots of tabs open." https://blog.google/products/chrome/more-helpful-chrome-throughout-your-workday/

shlomitsur commented 3 years ago

5 hours after deploying - looking good. image Thanks @alcounit

alcounit commented 3 years ago

@shlomitsur great.

shlomitsur commented 3 years ago

Hi @alcounit, I checked today and there are some stubborn pods that are hanging: image Interesting that those that are stuck are in 'Running' state and not 'Terminating' like before. image

alcounit commented 3 years ago

Hi @shlomitsur Share logs from seleniferous and selenosis

shlomitsur commented 3 years ago

sure image Selenosis logs are huge, I'll share some error messages: image

alcounit commented 3 years ago

sure image

this is a full log from that container?

shlomitsur commented 3 years ago

Yes. Here's another one: image

alcounit commented 3 years ago

@shlomitsur that's look odd because there should be some info at the end like:

vnc-chrome-90-0-e4c08d0e-27bc-44d6-942a-88345eecaff9 seleniferous {"level":"warning","msg":"session vnc-chrome-90-0-e4c08d0e-27bc-44d6-942a-88345eecaff9 delete request","request":"DELETE /wd/hub/session/vnc-chrome-90-0-e4c08d0e-27bc-44d6-942a-88345eecaff9","request_by":"selenosis-5fd476b9bb-r9d2h","request_id":"29f8146c-0933-49d4-aa52-939d41e9350d","time":"2021-04-26T12:11:53Z"}
vnc-chrome-90-0-e4c08d0e-27bc-44d6-942a-88345eecaff9 seleniferous {"level":"info","msg":"proxy session","request":"DELETE /wd/hub/session/vnc-chrome-90-0-e4c08d0e-27bc-44d6-942a-88345eecaff9","request_by":"selenosis-5fd476b9bb-r9d2h","request_id":"29f8146c-0933-49d4-aa52-939d41e9350d","time":"2021-04-26T12:11:53Z"}
vnc-chrome-90-0-e4c08d0e-27bc-44d6-942a-88345eecaff9 seleniferous {"level":"warning","msg":"unexpected stop signal received","time":"2021-04-26T12:11:53Z"}
vnc-chrome-90-0-e4c08d0e-27bc-44d6-942a-88345eecaff9 seleniferous {"level":"warning","msg":"stopping seleniferous","time":"2021-04-26T12:11:53Z"}

On your screenshot, I see that DELETE request proxied to the browser. No new lines after the last message is it correct?

alcounit commented 3 years ago

@shlomitsur please try alcounit/seleniferous:v1.0.1 this should fix deletion of zombie pods

shlomitsur commented 3 years ago

Good morning @alcounit! Things looking good so far - not hanging pods. Thanks

alcounit commented 3 years ago

@shlomitsur great to hear it.