antonputra / tutorials

DevOps Tutorials
https://youtube.com/antonputra
MIT License
2.96k stars 2.43k forks source link

Lession 214 - Database fine-tuning #307

Open melroy89 opened 1 week ago

melroy89 commented 1 week ago

Hi,

You didn't mention anything about fine-tuning the database servers? By default both PostgreSQL as well as MySQL/MariaDB are not very well configured with their default configuration options.

See my PostgreSQL config changes here: https://gitlab.melroy.org/-/snippets/610

And for MySQL/MariaDB see: https://gitlab.melroy.org/-/snippets/92

Did you made any changes at all? Since the defaults are really bad for both servers. Especially if you have enough RAM & fast SSDs.

Regards, Melroy

nscuro commented 1 week ago

For Postgres there's also PGTune, which yields a good starting point given the available resources.

The default Postgres config is suitable for running on minimal hardware (think Raspberry Pi), so definitely agree that tweaking it before running benchmarks is crucial.

antonputra commented 1 week ago

I used the default settings for the test. As for the hardware, I use i3en.large EC2 instances to run my tests. Thanks for the snippets and PGTune; I'll improve my setup for the next test.

melroy89 commented 1 week ago

Thanks for your response @antonputra!

I used the default settings for the test.

Yeah so indeed definitely do not try to use the default settings. Then you are mainly benchmarking how well the defaults are, but not the database engines.

I also believe i3en.large doesn't really reflect real hardware most server admins are using for these kind of databases in production. Maybe test with i3en.xlarge or i3en.3xlarge to be sure you are not bottlenecked by cloud infra too much....

Btw this is true for a lot of software, not just PostgreSQL or MySQL/MariaDB! Even languages like PHP, you can configure a lot and fine-tune a lot to get more out of the services.

antonputra commented 1 week ago

@melroy89, thanks again! If you could review the settings, that would be great. - https://github.com/antonputra/tutorials/tree/main/lessons/216

melroy89 commented 1 week ago

Let me check today!

melroy89 commented 1 week ago

OK, so about huge pages.

This can help a lot in databases. Huge pages can be turned on via: https://repost.aws/knowledge-center/configure-hugepages-ec2-linux-instance

If that all works you can try to enable huge_pages again. I'm not sure why pgtune would set this explicitly to off, most likely to avoid failures if you didn't enable huge pages under Linux. Anyhow... Try to enable it under Linux :)


I'm using Proxmox with Ubuntu VM.. If people use Proxmox as well, you can enable this option in your CPU hardware settings of the VM:

image

And then also enable it under the VM (see link above), and validate if it works:

image

melroy89 commented 1 week ago

Thanks I will also try to optimize MySQL even more using MySQLTuner..

And don't be afraid to play with the settings. For example, I'm curious whether decreasing innodb_buffer_pool_instances from 12 (1GB per instance) to 6 (to 2GB per instance) will actually decrease the overhead and improve the performance, or whether 12 is better.. Who knows, right? You might know soon enough..

EDIT: Fun fact, apparently innodb_buffer_pool_instances is deprecated under MariaDB now. Proof: https://mariadb.com/kb/en/innodb-buffer-pool/#innodb_buffer_pool_instances

AshleyPinner commented 6 days ago

Looking at the MySQL config:

  1. You have slow query logging enabled, which is disabled in your pgsql config - this might solely be the reason why MYSQL tanks once you get to a given number of req/s, as the slower the query, the more chance it has to be in the slow query log (which you've set to 1s) - this should be disabled, and is disabled by default, so not sure why you've enabled it?
  2. You haven't set any values for innodb_io_capacity and the _max options; While the defaults are better now you should adjust them as required.
  3. You probably want to set innodb_dedicated_server to ON (This will default some of the innodb options, so worth checking what gets set by this config value)
  4. You possibly want to increase the innodb_write_io_threads (defaults to 4 - though you also are only throwing 2 cores at the database, so you're constrained already)
  5. You possibly want to set innodb_flush_method to O_DSYNC
  6. You've set innodb_flush_log_at_trx_commit to 2 for some reason. This is no longer ACID compliant. If you were aiming to improve the way innodb flushes and you were throwing away ACID compliance, setting it to 0 is better. If you care about ACID, setting it to 1 is required.
  7. You set a non-default max_heap_table_size - but this will have no effect unless you're using ENGINE=memory for a table.
  8. In relation to the above, the create statements haven't been given for the mysql tables. While they should be similar to the pgsql ones, it's always a good idea to provide both to ensure that there's no changes to the tables between the versions that causes issues (eg allowing a null in a column which will always have non-null values disables two query optimisation methods that mysql uses in joins!)
  9. You've set transaction_isolation to READ COMMITTED and I can't work out why you'd do that. You then set binlog_format to ROW, and because you've set transaction_isolation to READ COMMITTED this is forced anyway (and ROW is the default!). Maybe a misunderstanding on what this does? The transaction isolation is pointless if you're not using transactions, though,
  10. Updates, inserts and deletes in MySQL can be slower if you've got binlog enabled. Whenever you write rows to innodb, it'll write the data to the various innodb files, and then write the row changes to the binlog. This is useful for a replication setup and for crash recovery, but in your case you're technically doing neither. I do not know enough about pgsql to know if it's doing the same duplication of writes for reslisancy, but if it turns out it's not, then you can either note that mysql will be more resilient to crashes or consider disabling the binary log for mysql to make the testing a bit more even.

Basically these tests are much closer to reality, but you've got a very odd mysql configuration!

I might have a play with this myself later and see what can work best for the mysql config.

melroy89 commented 6 days ago

Looking at the MySQL config:

And what about MariaDB as well? Since soon @antonputra will most likely also include MariaDB.

so not sure why you've enabled it?

Most likely he copied the slow logs from my config. While I do have it enabled (for good reasons). You definitely want to turn if off, for these kind of tests. So I agree with you.. I also left a comment here:

innodb_flush_log_at_trx_commit, transaction_isolation, binlog_format

Well I also run Nextcloud, I used these configurations by following: https://docs.nextcloud.com/server/latest/admin_manual/configuration_database/linux_database_configuration.html

if you think this is really really bad and wrong (for both MariaDB & MySQL). Maybe create a PR here as well: https://github.com/nextcloud/documentation/blob/master/admin_manual/configuration_database/linux_database_configuration.rst

It's saying:

To avoid data loss under high load scenarios (e.g. by using the sync client with many clients/users and many parallel operations) you need to configure the transaction isolation level accordingly.

AshleyPinner commented 6 days ago

I have almost zero experience in MariaDB, but I'd imagine a lot of the above still applies. One extra thing that you might need to work on in MariaDB is they continue to support the query cache, but that's been removed from MySQL itself since version 8 (remembering that they skipped 6 and 7). For a fair test it should be disabled.

It'd also be interesting to compare it to Percona - they have drop-in replacements for both MySQL and PostgreSQL, so could be interesting to bench Percona's MySQL vs Oracle's MySQL!

As for Nextcloud's suggestion, they make sense for their software because they're using transactions, but none of the tests here are, so changing the transaction isolation is a bit interesting.

melroy89 commented 4 days ago

One think we all agree with is removing the slow query logs. So I started with that, see PR: https://github.com/antonputra/tutorials/pull/319

smiera commented 1 day ago

@AshleyPinner thank you, thank you, THANK YOU! When I first saw this 'my.cnf' config, my eyes started to bleed. This is what happens when people write 'Benchmark' for software they don’t know how to use.

@antonputra, you make great and interesting videos. I often watch them and will continue to do so. But this 'Benchmark' – a complete failure.

melroy89 commented 1 day ago

@AshleyPinner thank you, thank you, THANK YOU! When I first saw this 'my.cnf' config, my eyes started to bleed. This is what happens when people write 'Benchmark' for software they don’t know how to use.

@antonputra, you make great and interesting videos. I often watch them and will continue to do so. But this 'Benchmark' – a complete failure.

Improvements are welcome to further improve any configuration you see in this repository. @antonputra is very willing to help and open for improvements. So feel free to create a PR yourself!