sgoggins commented 5 years ago

128 gig of ram, solid state drives …. got all the repos and analysis_data .. its been 21 hours loading project_weekly_cache .. no cpu usage, so I am guessing I have the database eating disk …

I’ve already made a set of modification and database config notes on my fork at sgoggins/facade:augur branch … I’m thinking I an rewrite the query that loads cache to go after one repository or project group at a time .. since this is a nominal, 4 hour thing for me (a very experienced database guy / formerly well compensated Oracle DBA) I thought I would circle back and see if you would approve a pull request that modularized some of the functions in facade-worker.py into another python file. Or how you would recommend doing this.

The refactoring would change how cache is built and have options for execution. I think:

Cache would not be rebuilt at the project level when "recache" is tagged automatically
Cache at the project level would be rebuilt one project at a time
I will explore a process of accumulating project level details from repository level cache, which may require some changes to the repository cache.
Cache would be rebuilt at the repo level one repo at at time.
i would take a parameter that enabled wholesale cache building at the project and repo level as is the case today for smaller scale implementations
I will explore the potential to keep cache without destroying it on each recache

What do you think @brianwarner ?

sgoggins commented 5 years ago

I do think I can put the mysql temp directory on another device and squeeze more out of the database, btw ... but I think I have reached a scale where db optimizations alone are not going to be sufficient. As an old DBA, I recognize that when I can't blame the network, I can definitely blame the app 🤣

brianwarner commented 5 years ago

I continue to be astounded by how you're using this at scale, @sgoggins. Very, very impressive (and way more than I've ever tried!)

I would agree, it's probably the database that's the issue. If I had to point my finger to one suspect, it's the way that Facade neurotically stuffs data into the database at every opportunity. I had originally done it this way so that if it failed mid-run, not much would be lost.

That said, the sheer volume of database transactions creates a maaaaaaasive amount of overhead in big databases. I think there's a fairly simple fix that will be low hanging fruit, whereby the database transactions for a single repo are accumulated into a temporary in-memory database and then only pushed into the analysis_data table at the very end. In the case where facade-worker.py fails, you'll still lose the stuff that was in-memory, but the performance gains should make up for it.

The reason I say this is the lowest hanging fruit is that by reducing database traffic, it may become feasible to use the native Python MySQL connector (PyMySQL). In a drag race it's considerably slower than the mysqldb, but using pymysql gives us the ability to use PyPy. Everything in Facade works with PyPy already, and it'll even choose pymysql when run under PyPy.

sgoggins commented 5 years ago

@brianwarner : did a lot of optimization fo the MariaDB parameters (which are the same as MySQL, but you know that story I am sure.) I have them in a file in my fork right now, but perhaps they belong instead in a read me.

Speeds up sorting when building facade cache

tmp_table_size = 16106127360 max_heap_table_size = 16106127360 query_cache_limit = 4M query_cache_size = 512M join_buffer_size = 8M

Timeouts deal with long running connections. Mostly needed when first scanning a large number of repos

thread_pool_idle_timeout = 40000 connect_timeout = 280000 deadlock_timeout_short = 100000 deadlock_timeout_long = 50000000 delayed_insert_timeout = 120000 innodb_lock_wait_timeout = 200000 lock_wait_timeout = 150000 interactive_timeout = 86500 net_read_timeout = 30000 net_write_timeout = 50000
slave_net_timeout = 40000 wait_timeout = 1500000

Yes, this array of timeout parameters is a bit "shotgun"

128 Gigs of RAM on server.

innodb_buffer_pool_size = 107374182400

Helps with sorting

key_buffer = 256M max_allowed_packet = 16M thread_stack = 192K thread_cache_size = 16

Connections are not a facade issue

max_connections = 250

A little caching helps with some of the queries

table_cache = 16K

Nice to know

log_slow_queries = /var/log/mysql/mysql-slow.log long_query_time = 20 expire_logs_days = 10 max_binlog_size = 100M

brianwarner / facade

and then I loaded 300 projects / 6,815 repos into facade and tried building cache 🤣 #31

Speeds up sorting when building facade cache

Timeouts deal with long running connections. Mostly needed when first scanning a large number of repos

128 Gigs of RAM on server.

Helps with sorting

Connections are not a facade issue

A little caching helps with some of the queries

Nice to know