Speed up stats - Githubissues

haneslinger commented 2 months ago

IMPORTANT

the lastest commit, alt, is a version that'll only run quick when you're extra columns are sparely populated (i think), like on dev1, BPS- DC, cycle 2016. it's odd,the commit before that might be better

alot stuff we weren't using anymore, however, most improvement comes from rewriting the queries.

cuts it in half, but it still takes along time.

Far and away, the most expensive part it getting the count of states that have each column populated. I rewrote the query to be more DB heavy, which sped it up, but it's still rather expensive it runs this query for every column (diff column name, of course):

SELECT COUNT(*) AS "__count" FROM "seed_propertystate" WHERE ("seed_propertystate"."id" IN (SELECT U0."state_id" FROM "seed_propertyview" U0 INNER JOIN "seed_property" U2 ON (U0."property_id" = U2."id") INNER JOIN "orgs_accesslevelinstance" U3 ON (U2."access_level_instance_id" = U3."id") WHERE (U0."cycle_id" = 647 AND U3."lft" >= 1 AND U3."rgt" <= 2 AND U2."organization_id" = 412)) AND NOT ("seed_propertystate"."total_ghg_emissions" IS NULL)); args=(647, 1, 2, 412)

and this query for every extra data column:

SELECT COUNT(*) AS "__count" FROM "seed_propertystate" WHERE ("seed_propertystate"."id" IN (SELECT U0."state_id" FROM "seed_propertyview" U0 INNER JOIN "seed_property" U2 ON (U0."property_id" = U2."id") INNER JOIN "orgs_accesslevelinstance" U3 ON (U2."access_level_instance_id" = U3."id") WHERE (U0."cycle_id" = 647 AND U3."lft" >= 1 AND U3."rgt" <= 2 AND U2."organization_id" = 412)) AND "seed_propertystate"."extra_data" ? 'geojson_Maxlat' AND NOT (("seed_propertystate"."extra_data" -> 'geojson_Maxlat') = 'null')); args=(647, 1, 2, 412, 'geojson_Maxlat', 'geojson_Maxlat', 'null')

now, while I dont think there's way around performing a group by for every column, I bet there a way to make a temporary table out of the bit they all share, the bit that gets the relevant states.

FROM "seed_propertystate" WHERE ("seed_propertystate"."id" IN (SELECT U0."state_id" FROM "seed_propertyview" U0 INNER JOIN "seed_property" U2 ON (U0."property_id" = U2."id") INNER JOIN "orgs_accesslevelinstance" U3 ON (U2."access_level_instance_id" = U3."id") WHERE (U0."cycle_id" = 647 AND U3."lft" >= 1 AND U3."rgt" <= 2 AND U2."organization_id" = 412))

I just dont know how.

github-actions[bot] commented 2 months ago

Label error. Requires at least 1 of: Feature, Bug, Enhancement, Maintenance, Documentation, Performance, Do not publish. Found:

github-actions[bot] commented 2 months ago

Label error. Requires at least 1 of: Feature, Bug, Enhancement, Maintenance, Documentation, Performance, Do not publish. Found:

kflemin commented 2 months ago

documenting the current plan for this ticket: awaiting feedback from @ebeers-png. Otherwise we will go with alt, deploy to dev2 and do some testing there with orgs that have many extra data fields.

axelstudios commented 2 months ago

I updated this to only need 1 query for extra_data columns, and 1 query for canonical columns, resulting in an 8.7x speed up over the improvements that Hannah already implemented. The number of queries has gone from 141 to 10 (4 of which are just overhead from the decorators).

In total, the stats request is now 35x faster than it used to be, and only takes 436ms to load for a large org.

Note

Previously, the counts discarded null and zero values. This PR only excludes null values from the counts
The internal column normalized_address was incorrectly being returned in the results, now the results correctly match the columns
This PR also fixes the Number of Extra Data Fields count

SEED-platform / seed

Speed up stats #4636

IMPORTANT