MariaDB / mariadb-docker

Docker Official Image packaging for MariaDB
https://mariadb.org
GNU General Public License v2.0
770 stars 438 forks source link

Speed up 10.4+ timezone initialization #320

Closed grooverdan closed 4 years ago

grooverdan commented 4 years ago

MariaDB-10.4 defaulted to Aria for system tables.

This introduced crash safety under the name of "transactional" that was not previously in MyISAM.

The Aria implementation of checkpointing incurs significant penalty on fuse-overlayfs that occurs significantly in container environments, especially those without a /var/lib/mysql volume.

We work around this penalty by disabling the crash safety of timezone tables for the period of timezone initialization.

Analysis and timings are in https://jira.mariadb.org/browse/MDEV-23326 and local tests show that 10.4 is only 0.8 seconds slower than 10.3 on startup (6.8 seconds total).

Version specific comments are used to ensure that ALTER TABLE statements aren't run on < 10.4 server versions.

closes #262

I'm unconvinced I can get any significant fix into MariaDB before the next release so this should close off a major issue for the next release(s).

This won't be the end of the story. Lets see if we can do all the docker_setup_db under docker_init_database_dir with a little upstream help and improve the statup time again.

grooverdan commented 4 years ago

FYI mariadb releases have occurred. I didn't get a fix into upstream before the release.

Further analysis indicates its not just (fuse)overlayfs affected per upstream MDEV.

While disabling the crash safety during initialization has some risks, any errors will abort the starting of the container because of the SQL errors. I do have a crash safe performance optimization work in progress that will be ready for next release (and consumes ~3s for the tz initialization).

This change will help the default deployment of mariadb containers of the user base without penalty.

tianon commented 4 years ago

The Aria implementation of checkpointing incurs significant penalty on fuse-overlayfs that occurs significantly in container environments, especially those without a /var/lib/mysql volume.

This is the bit that has me confused -- this image defines /var/lib/mysql to be a volume, and the users reporting slowness are all thus using that default volume (there doesn't exist a way to "unvolume"), so none of them are using MariaDB on top of an overlay data directory (although I can see how/why that would cause a significant performance overhead, which is precisely why we define the VOLUME in the first place, even though it has downsides for more esoteric deployment methods).

The common thread we saw in the "slowness" discussions was spinning disks vs SSDs (or even SSDs with very low available IOPS), so I'd love to make sure we're testing the same thing before we merge a fix which is made assuming the two slowness tests are the same.

grooverdan commented 4 years ago

I was wrong about overlayfs being the cause. I generally saw problems even on my local nvme. tmpfs as a VOLUME didn't seem to be an issue.

Catch me as @Daniel Black on https://mariadb.zulipchat.com because I'd like to make sure we understand each other fully on this and there's a lot of detail.

tianon commented 4 years ago

Oh, I'm aware there's a lot that's gone into this (I've been following your adventures in https://jira.mariadb.org/browse/MDEV-23326 :smile:), I just want to make sure you've done some tests on a non-NVMe (preferably spinning disk) drive as well to ensure the change is still dramatic there before we consider #262 fully "fixed" / closed.

@yosifkit just ran a simple test on a spinning drive in his system with 10.5.4 and it took ~11s before it even started the temporary server, and it was a full three minutes later when the temporary server was stopped (doing nothing but loading timezone data and setting a root password), so it's significantly more dramatic on a spinning drive, and I just want to make sure your testing has covered that case (since that's the one that's the most common in #262).

He's going to test this change on that same drive to get a simple comparison. :+1:

tianon commented 4 years ago

He had to test this change against 10.5.5 (because 10.5.4 is no longer available thanks to the new version being published) but it went from ~3m down to ~7s, so I'd say that's pretty compelling. :sweat_smile:

grooverdan commented 4 years ago
Model Family:     Western Digital Green
Device Model:     WDC WD40EZRX-00SPEB0
Serial Number:    WD-WCC4E5000UCH

ext4 mounted on /home/dan/datadir

rest of smart output showed it to be in not a great state.

test script

for v in 10.3 10.4
do
  podman run -d --rm -e MYSQL_ROOT_PASSWORD=pass \
    --expose 3306 \
    --volume /home/dan/datadir/data$v:/var/lib/mysql:Z \
    --name maria$v mariadb_test:$v &
  sleep 1
  time grep -iq "ready for start up" <(podman logs -f maria$v 2>&1) 
  podman logs maria$v
  sleep 1
  podman kill maria$v
  sleep 1
done

10.3 result

+ podman run -d --rm -e MYSQL_ROOT_PASSWORD=pass --expose 3306 --volume /home/dan/datadir/data10.3:/var/lib/mysql:Z --name maria10.3 mariadb_test:10.3
b5e35dbf6783dc0fffd3b41d755ddfae8617260f68abcde196287569a1b619f3
+ grep -iq 'ready for start up' /dev/fd/63
++ podman logs -f maria10.3

real    0m7.789s
user    0m0.000s
sys 0m0.002s

10.4 result

+ podman run -d --rm -e MYSQL_ROOT_PASSWORD=pass --expose 3306 --volume /home/dan/datadir/data10.4:/var/lib/mysql:Z --name maria10.4 mariadb_test:10.4
0b86680f45cc7f8af3e0e96e136ab2c6799187767e3517cc59f50dd15e065a61
+ grep -iq 'ready for start up' /dev/fd/63
++ podman logs -f maria10.4

real    0m13.793s
user    0m0.000s
sys 0m0.002s
+ podman logs maria10.4
grooverdan commented 4 years ago

and before change:

+ podman run -d --rm -e MYSQL_ROOT_PASSWORD=pass --expose 3306 --volume /home/dan/datadir/data10.4:/var/lib/mysql:Z --name maria10.4 mariadb:10.4
7365495faf0f4767909ea1818b0290730a51f40db45011767ab5b34ab300b39e
+ grep -iq 'ready for start up' /dev/fd/63
++ podman logs -f maria10.4

real    1m36.864s
user    0m0.000s
sys 0m0.002s
+ podman run -d --rm -e MYSQL_ROOT_PASSWORD=pass --expose 3306 --volume /home/dan/datadir/data10.3:/var/lib/mysql:Z --name maria10.3 mariadb:10.3
c78d97c1889a0bdf37e87da7ef673046418bb5307cfab6c8265253445ecba2de
+ grep -iq 'ready for start up' /dev/fd/63
++ podman logs -f maria10.3

real    0m7.786s
user    0m0.002s
sys 0m0.000s

So remaining question is if you want to script in some Aria recovery mysqlcheck --auto-repair just in case? I'm getting test case for that now.

grooverdan commented 4 years ago

On crash recovery, I managed to kill the statup of 10.3 (MyISAM) with a volume and the restart detected errors in the tz tables. The same applies now in 10.4 (though I haven't got the timings right - from MDEV seems there's a ~1 s window). As such I propose to leave that as is.

tianon commented 4 years ago

Nice, thank you!! :metal: :heart:

I did a rebase against master (and ran update.sh to apply the docker-entrypoint.sh change across all versions). Once CI is green, I plan to merge. :+1: