Closed grooverdan closed 4 years ago
FYI mariadb releases have occurred. I didn't get a fix into upstream before the release.
Further analysis indicates its not just (fuse)overlayfs affected per upstream MDEV.
While disabling the crash safety during initialization has some risks, any errors will abort the starting of the container because of the SQL errors. I do have a crash safe performance optimization work in progress that will be ready for next release (and consumes ~3s for the tz initialization).
This change will help the default deployment of mariadb containers of the user base without penalty.
The Aria implementation of checkpointing incurs significant penalty on fuse-overlayfs that occurs significantly in container environments, especially those without a /var/lib/mysql volume.
This is the bit that has me confused -- this image defines /var/lib/mysql
to be a volume, and the users reporting slowness are all thus using that default volume (there doesn't exist a way to "unvolume"), so none of them are using MariaDB on top of an overlay
data directory (although I can see how/why that would cause a significant performance overhead, which is precisely why we define the VOLUME
in the first place, even though it has downsides for more esoteric deployment methods).
The common thread we saw in the "slowness" discussions was spinning disks vs SSDs (or even SSDs with very low available IOPS), so I'd love to make sure we're testing the same thing before we merge a fix which is made assuming the two slowness tests are the same.
I was wrong about overlayfs being the cause. I generally saw problems even on my local nvme. tmpfs as a VOLUME didn't seem to be an issue.
Catch me as @Daniel Black
on https://mariadb.zulipchat.com because I'd like to make sure we understand each other fully on this and there's a lot of detail.
Oh, I'm aware there's a lot that's gone into this (I've been following your adventures in https://jira.mariadb.org/browse/MDEV-23326 :smile:), I just want to make sure you've done some tests on a non-NVMe (preferably spinning disk) drive as well to ensure the change is still dramatic there before we consider #262 fully "fixed" / closed.
@yosifkit just ran a simple test on a spinning drive in his system with 10.5.4 and it took ~11s before it even started the temporary server, and it was a full three minutes later when the temporary server was stopped (doing nothing but loading timezone data and setting a root password), so it's significantly more dramatic on a spinning drive, and I just want to make sure your testing has covered that case (since that's the one that's the most common in #262).
He's going to test this change on that same drive to get a simple comparison. :+1:
He had to test this change against 10.5.5 (because 10.5.4 is no longer available thanks to the new version being published) but it went from ~3m down to ~7s, so I'd say that's pretty compelling. :sweat_smile:
Model Family: Western Digital Green Device Model: WDC WD40EZRX-00SPEB0 Serial Number: WD-WCC4E5000UCH ext4 mounted on /home/dan/datadir rest of smart output showed it to be in not a great state.
test script
for v in 10.3 10.4 do podman run -d --rm -e MYSQL_ROOT_PASSWORD=pass \ --expose 3306 \ --volume /home/dan/datadir/data$v:/var/lib/mysql:Z \ --name maria$v mariadb_test:$v & sleep 1 time grep -iq "ready for start up" <(podman logs -f maria$v 2>&1) podman logs maria$v sleep 1 podman kill maria$v sleep 1 done
10.3 result
+ podman run -d --rm -e MYSQL_ROOT_PASSWORD=pass --expose 3306 --volume /home/dan/datadir/data10.3:/var/lib/mysql:Z --name maria10.3 mariadb_test:10.3 b5e35dbf6783dc0fffd3b41d755ddfae8617260f68abcde196287569a1b619f3 + grep -iq 'ready for start up' /dev/fd/63 ++ podman logs -f maria10.3 real 0m7.789s user 0m0.000s sys 0m0.002s
10.4 result
+ podman run -d --rm -e MYSQL_ROOT_PASSWORD=pass --expose 3306 --volume /home/dan/datadir/data10.4:/var/lib/mysql:Z --name maria10.4 mariadb_test:10.4 0b86680f45cc7f8af3e0e96e136ab2c6799187767e3517cc59f50dd15e065a61 + grep -iq 'ready for start up' /dev/fd/63 ++ podman logs -f maria10.4 real 0m13.793s user 0m0.000s sys 0m0.002s + podman logs maria10.4
and before change:
+ podman run -d --rm -e MYSQL_ROOT_PASSWORD=pass --expose 3306 --volume /home/dan/datadir/data10.4:/var/lib/mysql:Z --name maria10.4 mariadb:10.4 7365495faf0f4767909ea1818b0290730a51f40db45011767ab5b34ab300b39e + grep -iq 'ready for start up' /dev/fd/63 ++ podman logs -f maria10.4 real 1m36.864s user 0m0.000s sys 0m0.002s
+ podman run -d --rm -e MYSQL_ROOT_PASSWORD=pass --expose 3306 --volume /home/dan/datadir/data10.3:/var/lib/mysql:Z --name maria10.3 mariadb:10.3 c78d97c1889a0bdf37e87da7ef673046418bb5307cfab6c8265253445ecba2de + grep -iq 'ready for start up' /dev/fd/63 ++ podman logs -f maria10.3 real 0m7.786s user 0m0.002s sys 0m0.000s
So remaining question is if you want to script in some Aria recovery mysqlcheck --auto-repair
just in case? I'm getting test case for that now.
On crash recovery, I managed to kill the statup of 10.3 (MyISAM) with a volume and the restart detected errors in the tz tables. The same applies now in 10.4 (though I haven't got the timings right - from MDEV seems there's a ~1 s window). As such I propose to leave that as is.
Nice, thank you!! :metal: :heart:
I did a rebase against master (and ran update.sh
to apply the docker-entrypoint.sh
change across all versions). Once CI is green, I plan to merge. :+1:
MariaDB-10.4 defaulted to Aria for system tables.
This introduced crash safety under the name of "transactional" that was not previously in MyISAM.
The Aria implementation of checkpointing incurs significant penalty on fuse-overlayfs that occurs significantly in container environments, especially those without a /var/lib/mysql volume.
We work around this penalty by disabling the crash safety of timezone tables for the period of timezone initialization.
Analysis and timings are in https://jira.mariadb.org/browse/MDEV-23326 and local tests show that 10.4 is only 0.8 seconds slower than 10.3 on startup (6.8 seconds total).
Version specific comments are used to ensure that ALTER TABLE statements aren't run on < 10.4 server versions.
closes #262
I'm unconvinced I can get any significant fix into MariaDB before the next release so this should close off a major issue for the next release(s).
This won't be the end of the story. Lets see if we can do all the docker_setup_db under docker_init_database_dir with a little upstream help and improve the statup time again.