Open sgoggins opened 5 years ago
I do think I can put the mysql temp directory on another device and squeeze more out of the database, btw ... but I think I have reached a scale where db optimizations alone are not going to be sufficient. As an old DBA, I recognize that when I can't blame the network, I can definitely blame the app 🤣
I continue to be astounded by how you're using this at scale, @sgoggins. Very, very impressive (and way more than I've ever tried!)
I would agree, it's probably the database that's the issue. If I had to point my finger to one suspect, it's the way that Facade neurotically stuffs data into the database at every opportunity. I had originally done it this way so that if it failed mid-run, not much would be lost.
That said, the sheer volume of database transactions creates a maaaaaaasive amount of overhead in big databases. I think there's a fairly simple fix that will be low hanging fruit, whereby the database transactions for a single repo are accumulated into a temporary in-memory database and then only pushed into the analysis_data table at the very end. In the case where facade-worker.py fails, you'll still lose the stuff that was in-memory, but the performance gains should make up for it.
The reason I say this is the lowest hanging fruit is that by reducing database traffic, it may become feasible to use the native Python MySQL connector (PyMySQL
). In a drag race it's considerably slower than the mysqldb
, but using pymysql
gives us the ability to use PyPy. Everything in Facade works with PyPy already, and it'll even choose pymysql
when run under PyPy.
@brianwarner : did a lot of optimization fo the MariaDB parameters (which are the same as MySQL, but you know that story I am sure.) I have them in a file in my fork right now, but perhaps they belong instead in a read me.
tmp_table_size = 16106127360 max_heap_table_size = 16106127360 query_cache_limit = 4M query_cache_size = 512M join_buffer_size = 8M
thread_pool_idle_timeout = 40000
connect_timeout = 280000
deadlock_timeout_short = 100000
deadlock_timeout_long = 50000000
delayed_insert_timeout = 120000
innodb_lock_wait_timeout = 200000
lock_wait_timeout = 150000
interactive_timeout = 86500
net_read_timeout = 30000
net_write_timeout = 50000
slave_net_timeout = 40000
wait_timeout = 1500000
Yes, this array of timeout parameters is a bit "shotgun"
innodb_buffer_pool_size = 107374182400
key_buffer = 256M max_allowed_packet = 16M thread_stack = 192K thread_cache_size = 16
max_connections = 250
table_cache = 16K
log_slow_queries = /var/log/mysql/mysql-slow.log long_query_time = 20 expire_logs_days = 10 max_binlog_size = 100M
128 gig of ram, solid state drives …. got all the repos and analysis_data .. its been 21 hours loading project_weekly_cache .. no cpu usage, so I am guessing I have the database eating disk …
I’ve already made a set of modification and database config notes on my fork at sgoggins/facade:augur branch … I’m thinking I an rewrite the query that loads cache to go after one repository or project group at a time .. since this is a nominal, 4 hour thing for me (a very experienced database guy / formerly well compensated Oracle DBA) I thought I would circle back and see if you would approve a pull request that modularized some of the functions in facade-worker.py into another python file. Or how you would recommend doing this.
The refactoring would change how cache is built and have options for execution. I think:
What do you think @brianwarner ?