Cannot restart app at Toolforge

fnielsen commented 4 years ago

Hangs after

WSGI app 1 (mountpoint='/scholia') ready in 55 seconds on interpreter 0x24fc760 pid: 1
*** uWSGI is running in multiple interpreter mode ***
spawned uWSGI master process (pid: 1)
spawned uWSGI worker 1 (pid: 17, cores: 1)
spawned uWSGI worker 2 (pid: 18, cores: 1)
spawned uWSGI worker 3 (pid: 19, cores: 1)
spawned uWSGI worker 4 (pid: 20, cores: 1)

and "504 Gateway Time-out" from the site

From kubectl:

$ kubectl describe node `kubectl describe pods -l name=scholia | head -n 3 | tail -n 1 | awk '{print $2}' | awk -F"/" '{print $1}'`
Name:           tools-worker-1008.tools.eqiad.wmflabs
Labels:         kubernetes.io/hostname=tools-worker-1008.tools.eqiad.wmflabs
Taints:         <none>
CreationTimestamp:  Sat, 12 Mar 2016 04:23:52 +0000
Phase:          
Conditions:
  Type          Status  LastHeartbeatTime           LastTransitionTime          Reason              Message
  ----          ------  -----------------           ------------------          ------              -------
  Ready         True    Mon, 02 Dec 2019 21:41:47 +0000     Sat, 30 Nov 2019 22:34:44 +0000     KubeletReady            kubelet is posting ready status
  OutOfDisk         False   Mon, 02 Dec 2019 21:41:47 +0000     Sat, 30 Nov 2019 22:34:33 +0000     KubeletHasSufficientDisk    kubelet has sufficient disk space available
  MemoryPressure    False   Mon, 02 Dec 2019 21:41:47 +0000     Thu, 07 Jul 2016 13:14:57 +0000     KubeletHasSufficientMemory  kubelet has sufficient memory available
  DiskPressure      False   Mon, 02 Dec 2019 21:41:47 +0000     Tue, 24 Jan 2017 14:02:57 +0000     KubeletHasNoDiskPressure    kubelet has no disk pressure
Addresses:      172.16.3.216,172.16.3.216
Capacity:
 alpha.kubernetes.io/nvidia-gpu:    0
 cpu:                   4
 memory:                8179156Ki
 pods:                  110
Allocatable:
 alpha.kubernetes.io/nvidia-gpu:    0
 cpu:                   4
 memory:                8179156Ki
 pods:                  110
System Info:
 Machine ID:            d59e74c8689f444f8bf8656805f9ae89
 System UUID:           0B907BA7-3827-4DF7-8836-B728818EB44F
 Boot ID:           ea066490-b92d-4a30-9a1c-53ccfe34d9b1
 Kernel Version:        4.9.0-0.bpo.6-amd64
 OS Image:          Debian GNU/Linux 8 (jessie)
 Operating System:      linux
 Architecture:          amd64
 Container Runtime Version: docker://1.12.6
 Kubelet Version:       v1.4.6+e569a27
 Kube-Proxy Version:        v1.4.6+e569a27
ExternalID:         tools-worker-1008.tools.eqiad.wmflabs
Pods:               not authorized

I have tried to checkout an old version git checkout f4a14cfcf47d51a45ec3b3d3705c5fbefb8038cf

fnielsen commented 4 years ago

Tried to restart Ordia to see if there is a similar problem. Does neither get through

spawned uWSGI master process (pid: 1)
spawned uWSGI worker 1 (pid: 11, cores: 1)
spawned uWSGI worker 2 (pid: 12, cores: 1)
spawned uWSGI worker 3 (pid: 13, cores: 1)
spawned uWSGI worker 4 (pid: 14, cores: 1)

fnielsen commented 4 years ago

Unsure whether this is relevant https://phabricator.wikimedia.org/T239569

fnielsen commented 4 years ago

https://phabricator.wikimedia.org/P9797

fnielsen commented 4 years ago

Possibly " a problem with k8s ingress affecting new pods"

fnielsen commented 4 years ago

A fix is worked on here: https://phabricator.wikimedia.org/T239670

Krenair commented 4 years ago

Try now.

fnielsen commented 4 years ago

@Krenair Thanks!

Krenair commented 4 years ago

You can thank @crookedstorm for the proper debugging and fix :)

WDscholia / scholia

Cannot restart app at Toolforge #961