cesium-ml / baselayer

Fully customizable (scientific, compute-intensive) web application template
http://cesium-ml.org/baselayer/
31 stars 18 forks source link

Properly ignore joins on Public attributes #223

Closed kmshin1397 closed 3 years ago

kmshin1397 commented 3 years ago

This PR is essentially a follow-up to Danny's PR #221. It reinforces the reasoning behind the changes there. Most of these changes are just from black formatting, the relevant changes are lines 480:482 and 587:591 (I'll highlight these in a comment).

Basically, in #221, the "ignore and continue if combining a Public attribute" logic was added after the class_attribute for said attribute was already joined into the query (here) so queries were still joining those in. The changes in lines 480:482 addresses this, and is responsible for the sample effects I'll describe below.

This came up in the query for accessible Classifications on a particular Obj, generated here. Currently, that query produces something like this:

SELECT classifications.id,
       classifications.created_at,
       classifications.modified,
       classifications.classification,
       classifications.taxonomy_id,
       classifications.probability,
       classifications.author_id,
       classifications.author_name,
       classifications.obj_id
FROM   classifications
JOIN
       (
              SELECT classifications_1.id  AS id
              FROM   classifications       AS classifications_1
              JOIN   group_classifications AS group_classifications_1
              ON     classifications_1.id = group_classifications_1.classification_id
              JOIN   groups
              ON     groups.id = group_classifications_1.group_id
              JOIN   group_users AS group_users_1
              ON     groups.id = group_users_1.group_id
              JOIN   users
              ON     users.id = group_users_1.user_id
              WHERE  users.id = %(id_1)s) AS anon_1
ON     anon_1.id = classifications.id
JOIN
       (
              SELECT classifications_2.id AS id
              FROM   classifications      AS classifications_2
              JOIN   taxonomies
              ON     taxonomies.id = classifications_2.taxonomy_id
              JOIN
                     (
                            SELECT taxonomies.id AS id
                            FROM   taxonomies
                            JOIN   group_taxonomy AS group_taxonomy_1
                            ON     taxonomies.id = group_taxonomy_1.taxonomie_id
                            JOIN   groups
                            ON     groups.id = group_taxonomy_1.group_id
                            JOIN   group_users AS group_users_2
                            ON     groups.id = group_users_2.group_id
                            JOIN   users
                            ON     users.id = group_users_2.user_id
                            WHERE  users.id = %(id_2)s) AS anon_3
              ON     anon_3.id = taxonomies.id
              JOIN   objs
              ON     objs.id = classifications_2.obj_id) AS anon_2
ON     anon_2.id = classifications.id
WHERE  classifications.obj_id = %(obj_id_1)s

Note the JOIN objs ON objs.id = classifications_2.obj_id) AS anon_2. When this is EXPLAIN'ed , you'll note a sequential scan on the entire Objs table like this: Screenshot 2021-04-14 155538

With the changes in this PR, you now get a query like this:

SELECT classifications.id,
       classifications.created_at,
       classifications.modified,
       classifications.classification,
       classifications.taxonomy_id,
       classifications.probability,
       classifications.author_id,
       classifications.author_name,
       classifications.obj_id
FROM   classifications
JOIN
       (
              SELECT classifications_1.id  AS id
              FROM   classifications       AS classifications_1
              JOIN   group_classifications AS group_classifications_1
              ON     classifications_1.id = group_classifications_1.classification_id
              JOIN   groups
              ON     groups.id = group_classifications_1.group_id
              JOIN   group_users AS group_users_1
              ON     groups.id = group_users_1.group_id
              JOIN   users
              ON     users.id = group_users_1.user_id
              WHERE  users.id = %(id_1)s) AS anon_1
ON     anon_1.id = classifications.id
JOIN
       (
              SELECT classifications_2.id AS id
              FROM   classifications      AS classifications_2
              JOIN   taxonomies
              ON     taxonomies.id = classifications_2.taxonomy_id
              JOIN
                     (
                            SELECT taxonomies.id AS id
                            FROM   taxonomies
                            JOIN   group_taxonomy AS group_taxonomy_1
                            ON     taxonomies.id = group_taxonomy_1.taxonomie_id
                            JOIN   groups
                            ON     groups.id = group_taxonomy_1.group_id
                            JOIN   group_users AS group_users_2
                            ON     groups.id = group_users_2.group_id
                            JOIN   users
                            ON     users.id = group_users_2.user_id
                            WHERE  users.id = %(id_2)s) AS anon_3
              ON     anon_3.id = taxonomies.id) AS anon_2
ON     anon_2.id = classifications.id
WHERE  classifications.obj_id = %(obj_id_1)s

On the same sample query on production that I posted the EXPLAIN screenshot above, this new query gives: image Note the total time of ~4ms instead of 1100ms

The changes in lines 587:591 is more of a preemptive optimization that I don't think has popped up yet. It applies the same sort of "ignore Public attributes" to ComposedAccessControl logic.

pep8speaks commented 3 years ago

Hello @kmshin1397! Thanks for opening this PR. We checked the lines you've touched for PEP 8 issues, and found:

Line 368:80: E501 line too long (80 > 79 characters)