Closed nupplaphil closed 1 year ago
Can you tell how much memory MariaDB is using when it OOMs? mem_reservation: 4g
seems low, but maybe I misunderstand what it does.
Do you know which version your troubles began at?
Could also try jemalloc as a memory manager: env LD_PRELOAD=/usr/lib/x86_64-linux-gnu/libjemalloc.so.2
Can you tell how much memory MariaDB is using when it OOMs?
mem_reservation: 4g
seems low, but maybe I misunderstand what it does.Do you know which version your troubles began at?
tbh I don't know, I'm always using the
latest
version of the Docker images.
mem_reservation
: Allows you to specify a soft limit smaller than --memory which is activated when Docker detects contention or low memory on the host machine. If you use --memory-reservation, it must be set lower than --memory for it to take precedence. Because it is a soft limit, it does not guarantee that the container doesn’t exceed the limit.
https://docs.docker.com/config/containers/resource_constraints/#limit-a-containers-access-to-memory
Could also try jemalloc as a memory manager: env
LD_PRELOAD=/usr/lib/x86_64-linux-gnu/libjemalloc.so.2
I now switched, see:
+------------------------+------------------------------------------------------------+
| Variable_name | Value |
+------------------------+------------------------------------------------------------+
| version_malloc_library | jemalloc 5.2.1-0-gea6b3e973b477b8061e0076bb257dbd7f3faa756 |
+------------------------+------------------------------------------------------------+
Shall I wait some time for a restart or what do you suggest?
This is some minutes after switching to jemalloc
:
friendicame-db-1 | 2023-02-26 13:03:17 0 [Note] mariadbd (initiated by: unknown): Normal shutdown
friendicame-db-1 | 2023-02-26 13:03:17 0 [Note] InnoDB: FTS optimize thread exiting.
friendicame-db-1 | 2023-02-26 13:03:17 0 [Note] InnoDB: Starting shutdown...
friendicame-db-1 | 2023-02-26 13:03:17 0 [Note] InnoDB: Dumping buffer pool(s) to /var/lib/mysql/ib_buffer_pool
friendicame-db-1 | 2023-02-26 13:03:17 0 [Note] InnoDB: Restricted to 519200 pages due to innodb_buf_pool_dump_pct=25
friendicame-db-1 | 2023-02-26 13:03:17 0 [Note] InnoDB: Buffer pool(s) dump completed at 230226 13:03:17
friendicame-db-1 exited with code 0
friendicame-db-1 | 2023-02-26 13:03:28+00:00 [Note] [Entrypoint]: Entrypoint script for MariaDB Server 1:10.11.2+maria~ubu2204 started.
friendicame-db-1 | 2023-02-26 13:03:28+00:00 [Note] [Entrypoint]: Switching to dedicated user 'mysql'
friendicame-db-1 | 2023-02-26 13:03:28+00:00 [Note] [Entrypoint]: Entrypoint script for MariaDB Server 1:10.11.2+maria~ubu2204 started.
friendicame-db-1 | 2023-02-26 13:03:29+00:00 [Note] [Entrypoint]: MariaDB upgrade not required
friendicame-db-1 | 2023-02-26 13:03:29 0 [Note] Starting MariaDB 10.11.2-MariaDB-1:10.11.2+maria~ubu2204-log source revision cafba8761af55ae16cc69c9b53a341340a845b36 as process 1
friendicame-db-1 | 2023-02-26 13:03:29 0 [Note] InnoDB: Compressed tables use zlib 1.2.11
friendicame-db-1 | 2023-02-26 13:03:29 0 [Note] InnoDB: Using transactional memory
friendicame-db-1 | 2023-02-26 13:03:29 0 [Note] InnoDB: Number of transaction pools: 1
friendicame-db-1 | 2023-02-26 13:03:29 0 [Note] InnoDB: Using crc32 + pclmulqdq instructions
friendicame-db-1 | 2023-02-26 13:03:29 0 [Note] InnoDB: Using liburing
friendicame-db-1 | 2023-02-26 13:03:29 0 [Note] InnoDB: Initializing buffer pool, total size = 32.000GiB, chunk size = 512.000MiB
friendicame-db-1 | 2023-02-26 13:03:29 0 [Note] InnoDB: Completed initialization of buffer pool
friendicame-db-1 | 2023-02-26 13:03:29 0 [Note] InnoDB: File system buffers for log disabled (block size=512 bytes)
friendicame-db-1 | 2023-02-26 13:03:29 0 [Note] InnoDB: Starting crash recovery from checkpoint LSN=193796878816
friendicame-db-1 | 2023-02-26 13:03:44 0 [Note] InnoDB: Read redo log up to LSN=195420200063
friendicame-db-1 | 2023-02-26 13:03:59 0 [Note] InnoDB: Read redo log up to LSN=197030392720
friendicame-db-1 | 2023-02-26 13:04:14 0 [Note] InnoDB: Read redo log up to LSN=198389002919
friendicame-db-1 | 2023-02-26 13:04:29 0 [Note] InnoDB: Read redo log up to LSN=194182667856
friendicame-db-1 | 2023-02-26 13:04:44 0 [Note] InnoDB: Read redo log up to LSN=196107827853
friendicame-db-1 | 2023-02-26 13:04:59 0 [Note] InnoDB: Read redo log up to LSN=197986456996
friendicame-db-1 | 2023-02-26 13:05:13 0 [Note] InnoDB: Starting final batch to recover 435667 pages from redo log.
friendicame-db-1 | 2023-02-26 13:05:14 0 [Note] InnoDB: To recover: 333526 pages from log
friendicame-db-1 | 2023-02-26 13:05:21 0 [Note] InnoDB: Last binlog file './friendicame-bin.000017', position 3081397
friendicame-db-1 | 2023-02-26 13:05:21 0 [Note] InnoDB: 128 rollback segments are active.
friendicame-db-1 | 2023-02-26 13:05:21 0 [Note] InnoDB: Removed temporary tablespace data file: "./ibtmp1"
friendicame-db-1 | 2023-02-26 13:05:21 0 [Note] InnoDB: Setting file './ibtmp1' size to 12.000MiB. Physically writing the file full; Please wait ...
friendicame-db-1 | 2023-02-26 13:05:21 0 [Note] InnoDB: File './ibtmp1' size is now 12.000MiB.
friendicame-db-1 | 2023-02-26 13:05:21 0 [Note] InnoDB: log sequence number 199330293772; transaction id 369778667
friendicame-db-1 | 2023-02-26 13:05:21 0 [Note] InnoDB: Loading buffer pool(s) from /var/lib/mysql/ib_buffer_pool
friendicame-db-1 | 2023-02-26 13:05:21 0 [Note] Plugin 'FEEDBACK' is disabled.
friendicame-db-1 | 2023-02-26 13:05:21 0 [Note] Server socket created on IP: '0.0.0.0'.
friendicame-db-1 | 2023-02-26 13:05:21 0 [Note] Server socket created on IP: '::'.
friendicame-db-1 | 2023-02-26 13:05:21 0 [Note] mariadbd: ready for connections.
friendicame-db-1 | Version: '10.11.2-MariaDB-1:10.11.2+maria~ubu2204-log' socket: '/run/mysqld/mysqld.sock' port: 3306 mariadb.org binary distribution
friendicame-db-1 | 2023-02-26 13:05:22 6 [Warning] Aborted connection 6 to db: 'unconnected' user: 'unauthenticated' host: '192.168.32.7' (This connection closed normally without authentication)
friendicame-db-1 | 2023-02-26 13:05:24 0 [Note] InnoDB: Buffer pool(s) load completed at 230226 13:05:24
At least it seems like it's now exiting with code 0
.
This is the internal memory overview for this period:
Thanks for the mem_reservation
link. I've got a similar works within MariaDB in development (MDEV-24670). Is there docker logs to know when this contention period is detected.
Shall I wait some time for a restart or what do you suggest?
It looks like maybe docker saw the change and did the restart.
Just leave it running. jemalloc is reportedly better at preventing memory fragmentation. After a while of uptime you'll see if its leaking memory in the same way.
At least it seems like it's now exiting with code 0.
This is interesting. Not so much the 'existing with code' but the messages in the log. A normal stop
on the container should look like:
2023-02-26 23:26:43 0 [Note] mariadbd (initiated by: unknown): Normal shutdown
2023-02-26 23:26:43 0 [Note] InnoDB: FTS optimize thread exiting.
2023-02-26 23:26:43 0 [Note] InnoDB: Starting shutdown...
2023-02-26 23:26:43 0 [Note] InnoDB: Dumping buffer pool(s) to /var/lib/mysql/ib_buffer_pool
2023-02-26 23:26:43 0 [Note] InnoDB: Buffer pool(s) dump completed at 230226 23:26:43
2023-02-26 23:26:43 0 [Note] InnoDB: Removed temporary tablespace data file: "./ibtmp1"
2023-02-26 23:26:43 0 [Note] InnoDB: Shutdown completed; log sequence number 46590; transaction id 15
2023-02-26 23:26:43 0 [Note] mariadbd: Shutdown complete
Notably the last InnoDB message there, and the last shutdown are the bits that would ensure the next startup isn't a crash recovery.
So the lack of messages makes it look like hard kill during shutdown.
The resolution of the 137 error code. From bash(2) an error code above 128 is a indication that the process was killed with a signal. The difference between 128, and 137 is 9, which by kill -9 is SIGKILL. While OOM can trigger this, a docker kill {container}
also does a SIGKILL (quite soon after SIGTERM by the looks of the shutdown in the last log).
How are you shutting down your containers? Because a stop
mechanism will give them enough time to shutdown cleanly and avoid crash recovery.
I never saw this kind of lines since I started analyzing the crash recovery issue
[...]
2023-02-26 23:26:43 0 [Note] InnoDB: Removed temporary tablespace data file: "./ibtmp1"
2023-02-26 23:26:43 0 [Note] InnoDB: Shutdown completed; log sequence number 46590; transaction id 15
2023-02-26 23:26:43 0 [Note] mariadbd: Shutdown complete
So it seems like Docker is killing the container ..
Normally, I use one of the following commands to start/stop/restart container:
I don't use any additional settings, like --force
or --wait
, ...
Normally, it works fine for every other container.
Shall I wait some time for a restart or what do you suggest?
It looks like maybe docker saw the change and did the restart.
Just to clarify, I restarted the container manually just to check if the jemalloc
maybe fixed it :)
The docker compose stop/restart lists a 10 second timeout. After which I assume it does a sigkill. The previous logs seem to match exactly to this 10 second timeout.
Quite simply, MariaDB takes longer than 10 seconds to shutdown cleanly. Increase the timeout.
And jemalloc probably wasn't necessary. Please confirm even though this is closed.
Confirmed :-)
docker-compose up -d -t 120
was enough
It took 1.5 minutes to shutdown, now everything is working as expected.
Sorry for this "newbie" error ..
PS: I switched back to system malloc, since I don't know what benefits I would have with jemalloic for my setup
Good. 1.5 minutes of shutdown is better than 2 minutes of crash recovery (and looks a lot better as "normal operation").
Not that much of a newbie question, or I would have seen it a few days ago. Certainly worth adding to a FAQ when I get time to write one.
jemalloc is worth trying if you see something like a memory leak. Differencing between that an memory fragmentation is hard. On general benefits, not much idea beyond it exists dedicated as an alternate malloc library.
Thanks for choosing MariaDB. Bug reports/feature requests welcome, even if they come off as a newbie error, someone else will hit it, so we'll try to document/prevent it if possible.
I'm using the Docker image for MariaDB a long time. And I'm facing this issue for the last 1/2 year I guess. After unsuccessfully playing around with the Docker settings and the performance-settings of MariaDB itself for this range of time, I'm now here :-) tbh I still don't know if there's something wrong with my setup or with MariaDB itself.
The setup:
And this issue occurs at the two "power-services". The DB size is 33GB and 108GB
My docker-compose file:
My
performance.cnf
: for the 33GB database:My issue: Every time, I restart the
db
service, it stops with exit code 137 (OOM) and restarts with acrash recovery
. I already tried to adjust the log file size, the capacity, ...I'm constantly using https://github.com/major/MySQLTuner-perl to find any hints about performance issues.
I was currently facing this issue after the last restart, here's the log:
Do you have any idea what's wrong with my config ?