fedora-infra / bodhi

Bodhi is a web-system that facilitates the process of publishing updates for a Fedora-based software distribution.
https://bodhi.fedoraproject.org
GNU General Public License v2.0
155 stars 196 forks source link

Investigate performance issues #301

Closed lmacken closed 1 year ago

lmacken commented 9 years ago

We need to profile submitting an update and setting a request, as these things seem to be taking a very long time

ralphbean commented 9 years ago

Some ideas:

lmacken commented 9 years ago

Querying the wiki for test cases is extremely slow. We should move this to a backend process (issue #314)

[Thu Aug 27 19:03:29.658288 2015] [:error] [pid 11744] 2015-08-27 19:03:29,658 INFO [bodhi][Dummy-3] Querying the wiki for test cases [Thu Aug 27 19:05:37.587026 2015] [:error] [pid 11744] 2015-08-27 19:05:37,586 INFO [bodhi][Dummy-3] Done querying for test cases

lmacken commented 9 years ago

Okay, wiki & bug interactions have been moved to a backend fedmsg consumer on bodhi-backend02.

I think the next step is optimizing our model with indexes and maybe some additional eager-loading, etc.

pypingou commented 9 years ago

Note: we also have some tables that have just no primary key, just from a quick look:

They should be composite primary keys but I think they should be there and since primary keys are indexed we might gain some things there as well.

ralphbean commented 9 years ago

Cool. Will be interesting to see what adding those do to performance. As before, let's make sure to measure before and after speed times before we merge anything into develop.

ralphbean commented 9 years ago

Here's an example of a really slow query: https://bodhi.fedoraproject.org/updates/?packages=gnome-2048

ralphbean commented 9 years ago

So check these raw sql queries out. They both come from the get_updates function in bodhi/services/updates.py. The first one is the count_query and the second one is the call to query.all() that actually gets the updates.

SELECT count(DISTINCT updates.id) AS count_1
FROM updates
    LEFT OUTER JOIN releases AS releases_1 ON releases_1.id = updates.release_id
    LEFT OUTER JOIN comments AS comments_1 ON updates.id = comments_1.update_id
    LEFT OUTER JOIN users AS users_1 ON users_1.id = comments_1.user_id
    LEFT OUTER JOIN buildroot_overrides AS buildroot_overrides_1 ON users_1.id = buildroot_overrides_1.submitter_id
    LEFT OUTER JOIN builds AS builds_1 ON builds_1.id = buildroot_overrides_1.build_id
    LEFT OUTER JOIN releases AS releases_2 ON releases_2.id = builds_1.release_id
    LEFT OUTER JOIN packages AS packages_1 ON packages_1.id = builds_1.package_id
    LEFT OUTER JOIN stacks AS stacks_1 ON stacks_1.id = packages_1.stack_id
    LEFT OUTER JOIN builds AS builds_2 ON updates.id = builds_2.update_id
    LEFT OUTER JOIN releases AS releases_3 ON releases_3.id = builds_2.release_id
    LEFT OUTER JOIN packages AS packages_2 ON packages_2.id = builds_2.package_id
    LEFT OUTER JOIN stacks AS stacks_2 ON stacks_2.id = packages_2.stack_id
    LEFT OUTER JOIN buildroot_overrides AS buildroot_overrides_2 ON builds_2.id = buildroot_overrides_2.build_id
    LEFT OUTER JOIN users AS users_2 ON users_2.id = buildroot_overrides_2.submitter_id
    LEFT OUTER JOIN (
        update_bug_table AS update_bug_table_1
        JOIN bugs AS bugs_1 ON bugs_1.id = update_bug_table_1.bug_id
    ) ON updates.id = update_bug_table_1.update_id
    LEFT OUTER JOIN (
        update_cve_table AS update_cve_table_1
        JOIN cves AS cves_1 ON cves_1.id = update_cve_table_1.cve_id
    ) ON updates.id = update_cve_table_1.update_id
    LEFT OUTER JOIN users AS users_3 ON users_3.id = updates.user_id
    LEFT OUTER JOIN buildroot_overrides AS buildroot_overrides_3 ON users_3.id = buildroot_overrides_3.submitter_id
    LEFT OUTER JOIN builds AS builds_3 ON builds_3.id = buildroot_overrides_3.build_id
    LEFT OUTER JOIN releases AS releases_4 ON releases_4.id = builds_3.release_id
    LEFT OUTER JOIN packages AS packages_3 ON packages_3.id = builds_3.package_id
    LEFT OUTER JOIN stacks AS stacks_3 ON stacks_3.id = packages_3.stack_id

That was the count query. Here's the full one.

SELECT
    updates.id AS updates_id,
    updates.title AS updates_title,
    updates.karma AS updates_karma,
    updates.stable_karma AS updates_stable_karma,
    updates.unstable_karma AS updates_unstable_karma,
    updates.requirements AS updates_requirements,
    updates.require_bugs AS updates_require_bugs,
    updates.require_testcases AS updates_require_testcases,
    updates.notes AS updates_notes,
    updates.type AS updates_type,
    updates.status AS updates_status,
    updates.request AS updates_request,
    updates.severity AS updates_severity,
    updates.suggest AS updates_suggest,
    updates.locked AS updates_locked,
    updates.pushed AS updates_pushed,
    updates.critpath AS updates_critpath,
    updates.close_bugs AS updates_close_bugs,
    updates.date_submitted AS updates_date_submitted,
    updates.date_modified AS updates_date_modified,
    updates.date_approved AS updates_date_approved,
    updates.date_pushed AS updates_date_pushed,
    updates.date_testing AS updates_date_testing,
    updates.date_stable AS updates_date_stable,
    updates.alias AS updates_alias,
    updates.old_updateid AS updates_old_updateid,
    updates.release_id AS updates_release_id,
    updates.user_id AS updates_user_id,
    releases_1.id AS releases_1_id,
    releases_1.name AS releases_1_name,
    releases_1.long_name AS releases_1_long_name,
    releases_1.version AS releases_1_version,
    releases_1.id_prefix AS releases_1_id_prefix,
    releases_1.branch AS releases_1_branch,
    releases_1.dist_tag AS releases_1_dist_tag,
    releases_1.stable_tag AS releases_1_stable_tag,
    releases_1.testing_tag AS releases_1_testing_tag,
    releases_1.candidate_tag AS releases_1_candidate_tag,
    releases_1.pending_testing_tag AS releases_1_pending_testing_tag,
    releases_1.pending_stable_tag AS releases_1_pending_stable_tag,
    releases_1.override_tag AS releases_1_override_tag,
    releases_1.state AS releases_1_state,
    comments_1.id AS comments_1_id,
    comments_1.karma AS comments_1_karma,
    comments_1.karma_critpath AS comments_1_karma_critpath,
    comments_1.text AS comments_1_text,
    comments_1.anonymous AS comments_1_anonymous,
    comments_1.timestamp AS comments_1_timestamp,
    comments_1.update_id AS comments_1_update_id,
    comments_1.user_id AS comments_1_user_id,
    users_1.id AS users_1_id,
    users_1.name AS users_1_name,
    users_1.email AS users_1_email,
    releases_2.id AS releases_2_id,
    releases_2.name AS releases_2_name,
    releases_2.long_name AS releases_2_long_name,
    releases_2.version AS releases_2_version,
    releases_2.id_prefix AS releases_2_id_prefix,
    releases_2.branch AS releases_2_branch,
    releases_2.dist_tag AS releases_2_dist_tag,
    releases_2.stable_tag AS releases_2_stable_tag,
    releases_2.testing_tag AS releases_2_testing_tag,
    releases_2.candidate_tag AS releases_2_candidate_tag,
    releases_2.pending_testing_tag AS releases_2_pending_testing_tag,
    releases_2.pending_stable_tag AS releases_2_pending_stable_tag,
    releases_2.override_tag AS releases_2_override_tag,
    releases_2.state AS releases_2_state,
    builds_1.id AS builds_1_id,
    builds_1.nvr AS builds_1_nvr,
    builds_1.inherited AS builds_1_inherited,
    builds_1.package_id AS builds_1_package_id,
    builds_1.release_id AS builds_1_release_id,
    builds_1.update_id AS builds_1_update_id,
    packages_1.id AS packages_1_id,
    packages_1.name AS packages_1_name,
    packages_1.requirements AS packages_1_requirements,
    packages_1.stack_id AS packages_1_stack_id,
    stacks_1.id AS stacks_1_id,
    stacks_1.name AS stacks_1_name,
    stacks_1.description AS stacks_1_description,
    stacks_1.requirements AS stacks_1_requirements,
    buildroot_overrides_1.id AS buildroot_overrides_1_id,
    buildroot_overrides_1.build_id AS buildroot_overrides_1_build_id,
    buildroot_overrides_1.submitter_id AS buildroot_overrides_1_submitter_id,
    buildroot_overrides_1.notes AS buildroot_overrides_1_notes,
    buildroot_overrides_1.submission_date AS buildroot_overrides_1_submission_date,
    buildroot_overrides_1.expiration_date AS buildroot_overrides_1_expiration_date,
    buildroot_overrides_1.expired_date AS buildroot_overrides_1_expired_date,
    releases_3.id AS releases_3_id,
    releases_3.name AS releases_3_name,
    releases_3.long_name AS releases_3_long_name,
    releases_3.version AS releases_3_version,
    releases_3.id_prefix AS releases_3_id_prefix,
    releases_3.branch AS releases_3_branch,
    releases_3.dist_tag AS releases_3_dist_tag,
    releases_3.stable_tag AS releases_3_stable_tag,
    releases_3.testing_tag AS releases_3_testing_tag,
    releases_3.candidate_tag AS releases_3_candidate_tag,
    releases_3.pending_testing_tag AS releases_3_pending_testing_tag,
    releases_3.pending_stable_tag AS releases_3_pending_stable_tag,
    releases_3.override_tag AS releases_3_override_tag,
    releases_3.state AS releases_3_state,
    builds_2.id AS builds_2_id,
    builds_2.nvr AS builds_2_nvr,
    builds_2.inherited AS builds_2_inherited,
    builds_2.package_id AS builds_2_package_id,
    builds_2.release_id AS builds_2_release_id,
    builds_2.update_id AS builds_2_update_id,
    packages_2.id AS packages_2_id,
    packages_2.name AS packages_2_name,
    packages_2.requirements AS packages_2_requirements,
    packages_2.stack_id AS packages_2_stack_id,
    stacks_2.id AS stacks_2_id,
    stacks_2.name AS stacks_2_name,
    stacks_2.description AS stacks_2_description,
    stacks_2.requirements AS stacks_2_requirements,
    users_2.id AS users_2_id,
    users_2.name AS users_2_name,
    users_2.email AS users_2_email,
    buildroot_overrides_2.id AS buildroot_overrides_2_id,
    buildroot_overrides_2.build_id AS buildroot_overrides_2_build_id,
    buildroot_overrides_2.submitter_id AS buildroot_overrides_2_submitter_id,
    buildroot_overrides_2.notes AS buildroot_overrides_2_notes,
    buildroot_overrides_2.submission_date AS buildroot_overrides_2_submission_date,
    buildroot_overrides_2.expiration_date AS buildroot_overrides_2_expiration_date,
    buildroot_overrides_2.expired_date AS buildroot_overrides_2_expired_date,
    bugs_1.id AS bugs_1_id,
    bugs_1.bug_id AS bugs_1_bug_id,
    bugs_1.title AS bugs_1_title,
    bugs_1.security AS bugs_1_security,
    bugs_1.parent AS bugs_1_parent,
    cves_1.id AS cves_1_id,
    cves_1.cve_id AS cves_1_cve_id,
    users_3.id AS users_3_id,
    users_3.name AS users_3_name,
    users_3.email AS users_3_email,
    releases_4.id AS releases_4_id,
    releases_4.name AS releases_4_name,
    releases_4.long_name AS releases_4_long_name,
    releases_4.version AS releases_4_version,
    releases_4.id_prefix AS releases_4_id_prefix,
    releases_4.branch AS releases_4_branch,
    releases_4.dist_tag AS releases_4_dist_tag,
    releases_4.stable_tag AS releases_4_stable_tag,
    releases_4.testing_tag AS releases_4_testing_tag,
    releases_4.candidate_tag AS releases_4_candidate_tag,
    releases_4.pending_testing_tag AS releases_4_pending_testing_tag,
    releases_4.pending_stable_tag AS releases_4_pending_stable_tag,
    releases_4.override_tag AS releases_4_override_tag,
    releases_4.state AS releases_4_state,
    builds_3.id AS builds_3_id,
    builds_3.nvr AS builds_3_nvr,
    builds_3.inherited AS builds_3_inherited,
    builds_3.package_id AS builds_3_package_id,
    builds_3.release_id AS builds_3_release_id,
    builds_3.update_id AS builds_3_update_id,
    packages_3.id AS packages_3_id,
    packages_3.name AS packages_3_name,
    packages_3.requirements AS packages_3_requirements,
    packages_3.stack_id AS packages_3_stack_id,
    stacks_3.id AS stacks_3_id,
    stacks_3.name AS stacks_3_name,
    stacks_3.description AS stacks_3_description,
    stacks_3.requirements AS stacks_3_requirements,
    buildroot_overrides_3.id AS buildroot_overrides_3_id,
    buildroot_overrides_3.build_id AS buildroot_overrides_3_build_id,
    buildroot_overrides_3.submitter_id AS buildroot_overrides_3_submitter_id,
    buildroot_overrides_3.notes AS buildroot_overrides_3_notes,
    buildroot_overrides_3.submission_date AS buildroot_overrides_3_submission_date,
    buildroot_overrides_3.expiration_date AS buildroot_overrides_3_expiration_date,
    buildroot_overrides_3.expired_date AS buildroot_overrides_3_expired_date
FROM updates
    LEFT OUTER JOIN releases AS releases_1 ON releases_1.id = updates.release_id
    LEFT OUTER JOIN comments AS comments_1 ON updates.id = comments_1.update_id
    LEFT OUTER JOIN users AS users_1 ON users_1.id = comments_1.user_id
    LEFT OUTER JOIN buildroot_overrides AS buildroot_overrides_1 ON users_1.id = buildroot_overrides_1.submitter_id
    LEFT OUTER JOIN builds AS builds_1 ON builds_1.id = buildroot_overrides_1.build_id
    LEFT OUTER JOIN releases AS releases_2 ON releases_2.id = builds_1.release_id
    LEFT OUTER JOIN packages AS packages_1 ON packages_1.id = builds_1.package_id
    LEFT OUTER JOIN stacks AS stacks_1 ON stacks_1.id = packages_1.stack_id
    LEFT OUTER JOIN builds AS builds_2 ON updates.id = builds_2.update_id
    LEFT OUTER JOIN releases AS releases_3 ON releases_3.id = builds_2.release_id
    LEFT OUTER JOIN packages AS packages_2 ON packages_2.id = builds_2.package_id
    LEFT OUTER JOIN stacks AS stacks_2 ON stacks_2.id = packages_2.stack_id
    LEFT OUTER JOIN buildroot_overrides AS buildroot_overrides_2 ON builds_2.id = buildroot_overrides_2.build_id
    LEFT OUTER JOIN users AS users_2 ON users_2.id = buildroot_overrides_2.submitter_id
    LEFT OUTER JOIN (
        update_bug_table AS update_bug_table_1
        JOIN bugs AS bugs_1 ON bugs_1.id = update_bug_table_1.bug_id
    ) ON updates.id = update_bug_table_1.update_id
    LEFT OUTER JOIN (
        update_cve_table AS update_cve_table_1
        JOIN cves AS cves_1 ON cves_1.id = update_cve_table_1.cve_id
    ) ON updates.id = update_cve_table_1.update_id
    LEFT OUTER JOIN users AS users_3 ON users_3.id = updates.user_id
    LEFT OUTER JOIN buildroot_overrides AS buildroot_overrides_3 ON users_3.id = buildroot_overrides_3.submitter_id
    LEFT OUTER JOIN builds AS builds_3 ON builds_3.id = buildroot_overrides_3.build_id
    LEFT OUTER JOIN releases AS releases_4 ON releases_4.id = builds_3.release_id
    LEFT OUTER JOIN packages AS packages_3 ON packages_3.id = builds_3.package_id
    LEFT OUTER JOIN stacks AS stacks_3 ON stacks_3.id = packages_3.stack_id
ORDER BY updates.date_submitted DESC, comments_1.timestamp

Pretty nuts.

I timed these two on a basic query to http://localhost:6543/updates/ with no filters in place. The count query took 3s and the query.all() query took 0.2s (almost no time at all).

pypingou commented 9 years ago

Tested with

import requests
import timeit

def test_bodhi_updates():
    url = 'http://0.0.0.0:6543/updates/?packages=gnome-2048'
    req = requests.get(url)
    assert req.status_code == 200

timeit.timeit(test_bodhi_updates, number=10)
ralphbean commented 9 years ago

Well, there's lot of lazy loading declared in the models. If I do the exact opposite and turn it all off, it doesn't get that much better. The query times in get_updates drop dramatically, but then later when __json__ is called, we spend just as much time making all the subsequent queries. It's equivalent. There's probably some ground in the middle that is the best of both worlds.

We could also consider trimming the number of relationships that the call to __json__ traverses.

pypingou commented 9 years ago

And the memory consumption bumps then when getting the __json__ ?

ralphbean commented 9 years ago

I pushed a commit (de8e1bd) that makes some improvements. I'm not submitting a PR yet though since there's probably many other ways to go about making this faster. Check out these results.

before, on commit 258d414

update_list 3.1128051281 seconds
update_list_gnome 40.4445779324 seconds
update_view 0.643528938293 seconds
update_view_gnome 106.289006948 seconds
release_list 0.0776150226593 seconds
release_view 0.0262730121613 seconds
comment_list 144.048120022 seconds
comment_view 0.0465550422668 seconds
user_list 0.23695397377 seconds
user_view 0.279469013214 seconds

after, on commit de8e1bd

update_list 1.60950088501 seconds
update_list_gnome 7.19923591614 seconds
update_view 0.735200166702 seconds
update_view_gnome 4.58303284645 seconds
release_list 0.0293581485748 seconds
release_view 0.0229330062866 seconds
comment_list 5.7861289978 seconds
comment_view 0.0673890113831 seconds
user_list 0.232038021088 seconds
user_view 0.113650798798 seconds

I'd like to update that perf-test tool to compare different git commits and make producing tables like these easy (so we can be more sure of what we're doing).

ralphbean commented 9 years ago

Found another spot that can probably be improved in the same way as #612.

            "user": {
                "avatar": "https://seccdn.libravatar.org/avatar/9c9f7784935381befc302fe3c814f9136e7a33953d0318761669b8643f4df55c?s=24&d=retro", 
                "email": "rbean@redhat.com", 
                "groups": [
                    {
                        "name": "packager", 
                        "stacks": []
                    }, 
                    {
                        "name": "bodhiadmin", 
                        "stacks": []
                    }, 
                    {
                        "name": "sysadmin-main", 
                        "stacks": []
                    }, 
                    {
                        "name": "infra-sig", 
                        "stacks": []
                    }
                ], 
                "id": 128, 
                "name": "ralph", 
                "openid": "ralph.id.fedoraproject.org"
            }

When you query for an update, one of the fields is the "user" that submitted the update. Well, that user has groups (which all get loaded) and those groups have stacks, which all get loaded. That is unnecessary, and will likely save us a lot of time if we trim it out.

lmacken commented 9 years ago

Okay, so this line in Update.comment is causing a ton of CPU+DB+memory churn.

user.comments.append(comment)

This causes all comments for the user to get loaded into memory.

http://pieces.openpolitics.com/2006/07/sqlalchemy-beware-of-backref/

ralphbean commented 9 years ago

OH!

lmacken commented 9 years ago

It seems like lazy='dynamic' is the proper solution, but when doing it on User.comments, I hit the following:

Traceback (most recent call last):                                                            
  File "/usr/lib/python2.7/site-packages/bodhi/services/updates.py", line 128, in set_request 
    update.set_request(request.db, action, request.user.name)                                 
  File "/usr/lib/python2.7/site-packages/bodhi/models/models.py", line 1114, in set_request   
    action.description, username, notes), author=u'bodhi')                                    
  File "/usr/lib/python2.7/site-packages/bodhi/models/models.py", line 1473, in comment       
    if comment.anonymous or comment.user.name == u'bodhi':                                    
AttributeError: 'NoneType' object has no attribute 'name'                                     

Even if I set comment.user = user right above it.

ralphbean commented 9 years ago

Even if I set comment.user = user right above it.

That is really surprising.

lmacken commented 9 years ago

That is really surprising.

Oh wait, I think this was due to a busted comment from a previous hack. I think I've got it figured out.

genodeftest commented 7 years ago

Hm, it still takes me >60 seconds to get a list of packages in updates-testing when running fedora-easy-karma on Fedora 26. See https://bugzilla.redhat.com/show_bug.cgi?id=1266060.

mattiaverga commented 1 year ago

I believe this can be closed now. During the past years there have been several tweaks on database side and json object serialization which fixed most (if not all) bottlenecks. As usual, feel free to reopen or open a new bug if you sport something more.