Closed taurus-forever closed 1 year ago
The valuable hint came from John in MM:
So the "terminating due to SIGTERM" is because we got a SIGTERM (I believe) not because juju is initiating that request.
I believe that means that K8s is telling us we need to be torn down (either because it is rescheduling, or a health check failed, or something else).
The last second mysql-error log:
2023-07-12T11:09:58.830318Z 0 [System] [MY-013172] [Server] Received SHUTDOWN from user <via user signal>. Shutting down mysqld (Version: 8.0.32-0ubuntu0.22.04.2).
2023-07-12T11:10:02.387287Z 0 [System] [MY-011504] [Repl] Plugin group_replication reported: 'Group membership changed: This member has left the group.'
ERROR "command terminated with exit code 137"
The Events on K8s says pod get killed due to lack of the RAM:
> kubectl get events -w
LAST SEEN TYPE REASON OBJECT MESSAGE
14s Warning EvictionThresholdMet node/gke-taurus-20152-default-pool-dd74c68c-7cq0 Attempting to reclaim memory
12s Normal NodeHasInsufficientMemory node/gke-taurus-20152-default-pool-dd74c68c-7cq0 Node gke-taurus-20152-default-pool-dd74c68c-7cq0 status is now: NodeHasInsufficientMemory
The node had 16Gb RAM but limits usability to 12Gb only:
> kubectl describe node gke-taurus-20152-default-pool-dd74c68c-7cq0
...
Capacity:
...
memory: 15358168Ki
Allocatable:
...
memory: 12658904Ki
Our default charm code uses 75% RAM (Capacity) for innodb_buffer_pull_size = 11Gb which is close to 12Gb allowed by node (Allocatable), also node has another services running => BOOM:
juju ssh --container mysql mysql-k8s/0 bash
root@mysql-k8s-0:/# cat /etc/mysql/mysql.conf.d/z-custom.cnf
[mysqld]
...
innodb_buffer_pool_size = 11811160064
As a result MySQL went above the limits and is being killed:
root@mysql-k8s-0:/# top
top - 11:10:03 up 18:47, 0 users, load average: 5.22, 6.06, 4.43
Tasks: 10 total, 1 running, 9 sleeping, 0 stopped, 0 zombie
%Cpu0 : 0.0/0.0 0[ ]
%Cpu1 : 7.1/0.0 7[||||||| ]
%Cpu2 : 0.0/0.0 0[ ]
%Cpu3 : 0.0/7.1 7[||||||| ]
MiB Mem : 14998.2 total, 361.0 free, 12576.1 used, 2061.0 buff/cache
MiB Swap: 0.0 total, 0.0 free, 0.0 used. 2049.0 avail Mem
PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND
234 mysql 20 0 15.9g 11.5g 29560 S 7.1 78.3 34:13.28 /usr/sbin/mysqld --basedir=/usr --datadir=/var/lib/mysql --plugin-dir=/usr/lib/mysql/plugin --log-error=/var/log/mysql/error.log --pid-file=mysql-k8s-0.pid
1584 root 20 0 7384 3284 2680 R 7.1 0.0 0:00.36 top
TODO: count innodb_buffer_pull_size as 75% of Allocatable RAM on K8s.
TL;DR: @paulomach , please commit group_replication_message_cache_size=128Mb for profile=testing
.
As for profile=production
, decrease the value of innodb_buffer_pool_size, after calculating it as 75% of Allocatable RAM. Decrease it on the amount of the current group_replication_message_cache_size
(1Gb).
Long: MySQL InnoDB Group replication has an internal XCom cache, read more here: https://dev.mysql.com/doc/refman/8.0/en/group-replication-performance-xcom-cache-reduce.html
What we saw in standard deployment: K8s node has 16Gb, Allocatable=12Gb, InnoDB buffer got 11G (75% of 16Gb), XCom cache reached 1G and node killed the pod, due to grow above "Allocatable RAM".
In constrained deployment: Pod get limit 5G, InnoDB buffer got 4G (75% RAM), XCom cache reached 1G and mysqld killed byt OOM:
4m4s Warning OOMKilling node/gke-taurus-20152-default-pool-dd74c68c-wp6s Memory cgroup out of memory: Killed process 68365 (mysqld) total-vm:8573716kB, anon-rss:5206312kB, file-rss:41072kB, shmem-rss:0kB, UID:584788 pgtables:10880kB oom_score_adj:659
We should consider to tune XCom cache properly for K8s in the future, but for now, let's keep default values, but decrease InnoDB cache size for XCom amount (to keep the free RAM for grow). Tnx!
2 @paulomach the testing on GKE shows a way better stability results, but still failed due to killed model-controller (separate/next topic). Please finish draft to the merge-ready state and lest merge it in edge
to continue testing. Tnx!
Steps to reproduce
1) Deploy mysql-k8s + mysql-router-k8s on GKE using https://discourse.charmhub.io/t/charmed-mysql-k8s-how-to-deploy-on-gke/10875 2) Run performance testing using https://discourse.charmhub.io/t/charmed-mysql-k8s-how-to-performance-test/11073
Expected behavior
The performance test must finish successfully.
Actual behavior
The test fails with an error:
Literally MySQL K8s Primary has gone away. K8s says the pod has been recreated (not-even restarted):
Juju debug logs shows a try to run
update-status
on the MySQL Primary node.Versions
Operating system: 22.04 Juju CLI: 2.9.44 Juju agent: 2.9.44 Charm revision: 8.0/edge microk8s: Tested on GKE and microk8s v1.27.2 5372 1.27/stable canonical✓ classic
Log output
Juju debug log:
As you can see above, the
caasunitterminationworker
generates SIGTERM after about 1 minute of waiting after "update-status" ran. It looks like we have some internal timer/timeout there.The normal
update-status
case scenario (without performance test):As you can see, it took 4 seconds to do nothing (I have muted update_status charm block). The line "ops 2.4.1 up and running" is missing in the logs above. Did ops fail to launch due to high unit/k8s/mysql load? 0_o
Additional context
It looks very similar to https://github.com/canonical/prometheus-k8s-operator/issues/426 which came to nowhere... Note we are using 2.9.44 with stop handling fixes, so https://bugs.launchpad.net/juju/+bug/1951415 is no longer a problem here!
The entire debug log with Juju 2,9 is here, see line 695 (SIGTERM time). The entire debug log with Juju 3.2 is here, see line 493 (SIGTERM time). Good ideas are welcome!