canonical / mysql-operator

Machine charm for MySQL following the operator framework
https://charmhub.io/mysql
Apache License 2.0
7 stars 10 forks source link

`innodb_buffer_pool_size` and `max_connections` are trying to use ~175% of the physical memory? #407

Closed nobuto-m closed 2 months ago

nobuto-m commented 5 months ago

(there is question mark in the title since I'm not 100% sure yet before running a stress test)

Steps to reproduce

  1. juju deploy mysql --channel 8.0/edge

Expected behavior

mysqld should allocate memory for innodb_buffer_pool_size and max_connections within the amount of the physical memory available.

Actual behavior

The charm writes the following config with 32GB memory system.

[/var/snap/charmed-mysql/current/etc/mysql/mysql.conf.d/z-custom-mysqld.cnf]

[mysqld]
bind-address = 0.0.0.0
mysqlx-bind-address = 0.0.0.0
report_host = juju-584fb0-0.lxd
max_connections = 2570
innodb_buffer_pool_size = 23219666944
log_error_services = log_filter_internal;log_sink_internal
log_error = /var/snap/charmed-mysql/common/var/log/mysql/error.log
general_log = ON
general_log_file = /var/snap/charmed-mysql/common/var/log/mysql/general.log
slow_query_log_file = /var/snap/charmed-mysql/common/var/log/mysql/slowquery.log
innodb_buffer_pool_chunk_size = 2902458368

So it looks like innodbbuffer* tries to use ~23GB of memory at maximum and max_connections tries to use ~32GB of memory at maximum on the same 32GB memory.

https://github.com/canonical/mysql-operator/blob/8bf9fb36ce23b06f5b0838868a2ebe54486d912f/lib/charms/mysql/v0/mysql.py#L717-L727

mysql_major_parameters

Versions

Operating system: Ubuntu 22.04 LTS

Juju CLI: 3.4.0-genericlinux-amd64

Juju agent: 3.4.0

Charm revision: mysql 8.0/edge 210

LXD: 5.20

Log output

Juju debug log:

This is from an extra debug logging with by hand modification to add some traces.

2024-03-05 14:13:28 WARNING unit.mysql/0.juju-log server.go:325 available_memory=32341442560
2024-03-05 14:13:28 WARNING unit.mysql/0.juju-log server.go:325 memory_limit=0
2024-03-05 14:13:28 WARNING unit.mysql/0.juju-log server.go:325 available_memory=32341442560
2024-03-05 14:13:28 WARNING unit.mysql/0.juju-log server.go:325 innodb_buffer_pool_size=23219666944
2024-03-05 14:13:28 WARNING unit.mysql/0.juju-log server.go:325 innodb_buffer_pool_chunk_size=2902458368
2024-03-05 14:13:28 WARNING unit.mysql/0.juju-log server.go:325 group_replication_message_cache_size=None
2024-03-05 14:13:28 WARNING unit.mysql/0.juju-log server.go:325 max_connections=2570

Additional context

github-actions[bot] commented 5 months ago

https://warthogs.atlassian.net/browse/DPE-3706

paulomach commented 5 months ago

@nobuto-m , you are on absolutely correct. Created PR with the fix

nobuto-m commented 5 months ago

https://github.com/canonical/mysql-operator/pull/429#discussion_r1544142953

Essentially this is (at least) a 3-parameter problem to me. The proposed logic is better than the current one since it can avoid overcommitting memory to 175%. However, keeping limiting user-configurable parameters to 1 doesn't solve the fundamental challenges.

mysql_major_parameters

At this moment, we don't allow configuration other than the parameter "A" in the diagram. I assume that the mysql.py logic is shared between the machine charm and the k8s charms so the parameter "A" is determined either by the total amount of physical memory, memory_limit charm config, or pod resource limit. But no other parameters about the memory allocation can be supplied by an user.

When an user needs to bump the max_connections we will have to bump the memory requirement 4 times unnecessarily since there is no way to tell the charm to change the ratio of the memory allocation (75% innodb pool size + 25% max_connections).

For example with MySQL use cases as the backend of OpenStack deployment, we know that it requires a few thousand max connections to cover the distributed (and micro service-ish) control plane services and the multiple workers of each micro service, but 8GB of innodb buffer pool size is more than enough.

Let's say to have max_connections=4,000, 64 GB of the instance is more than enough (4,000 connections 12MB + 8GB innodb buffer pool size = 56GB). However, with the fixed ratio, we need 200GB machine (4,000 connections 12MB * (75+25)% / 25% = 192GB).

I cannot think of a better way than moving away from a single parameter.

paulomach commented 5 months ago

Hi @nobuto-m , indeed it's a very compelling argument. By concept the operator is trying to reach a close to optimal self calculated defaults and for sure we still need to sophisticate it further. I will take your case forward, given it's impossible to satisfy every use case.