apache / superset

Apache Superset is a Data Visualization and Data Exploration Platform
https://superset.apache.org/
Apache License 2.0
62.62k stars 13.82k forks source link

[welcome page][perf] Slow queries to generate welcome page #16528

Open graceguo-supercat opened 3 years ago

graceguo-supercat commented 3 years ago

How to reproduce the bug

  1. As user using Superset for years,
  2. Go to https://superset.d.musta.ch/superset/welcome/
  3. Welcome page show spinner very, very long time: kpFvNcc17A
  4. from MySQL Slow query log, I can see the query to get recent viewed items can take > 20 mins.

Environment

airbnb production

Checklist

Make sure to follow these steps before submitting your issue - thank you!

Additional context

Add any other context about the problem here.

stale[bot] commented 2 years ago

This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions. For admin, please label this issue .pinned to prevent stale bot from closing the issue.

cesar-loadsmart commented 2 years ago

We saw that some of the queries used to fill out the welcome page access the logs table at the metadata DB. In our case, this table didn't have an index. In other words, it's fine for test environments, but as Superset grows in your company this might became a problem:

image

D3nn3 commented 2 years ago

Can confirm this. Saw the same query in our Postgres DB logs taking almost 10s.

cesar-loadsmart commented 2 years ago

We've fixed our problem by creating a specific index for this query:

SELECT anon_1.dashboard_id AS anon_1_dashboard_id, anon_1.slice_id AS anon_1_slice_id, anon_1.action AS anon_1_action, anon_1.dttm AS anon_1_dttm, dashboards.slug AS dashboard_slug, dashboards.dashboard_title AS dashboards_dashboard_title, slices.slice_name AS slices_slice_name 
FROM (SELECT logs.dashboard_id AS dashboard_id, logs.slice_id AS slice_id, logs.action AS action, max(logs.dttm) AS dttm 
FROM logs 
WHERE logs.action IN ('explore', 'dashboard') AND logs.user_id = <USERID> AND logs.dttm > '2021-08-24T17:20:50.496891'::timestamp AND (logs.dashboard_id IS NOT NULL OR logs.slice_id IS NOT NULL) GROUP BY logs.dashboard_id, logs.slice_id, logs.action) AS anon_1 LEFT OUTER JOIN dashboards ON dashboards.id = anon_1.dashboard_id LEFT OUTER JOIN slices ON slices.id = anon_1.slice_id 
WHERE dashboards.dashboard_title != '' OR slices.slice_name != '' ORDER BY anon_1.dttm DESC 
 LIMIT 6

Our Superset became faster than it was released, today we have around 400 users and more than 100 published dashboards.

jplanckeel commented 2 years ago

Hello, we have created a purge for the log but it is slow because the logs table is constantly called. This has been fixed by creating an index on the dttm column of the logs table.

Can we add an index on next release ?

our PR for add purge on event logger ;) https://github.com/apache/superset/pull/20636

CREATE INDEX "dttm_index"
ON "logs" ("dttm");
chathawee commented 1 year ago

We've fixed our problem by creating a specific index for this query:

SELECT anon_1.dashboard_id AS anon_1_dashboard_id, anon_1.slice_id AS anon_1_slice_id, anon_1.action AS anon_1_action, anon_1.dttm AS anon_1_dttm, dashboards.slug AS dashboard_slug, dashboards.dashboard_title AS dashboards_dashboard_title, slices.slice_name AS slices_slice_name 
FROM (SELECT logs.dashboard_id AS dashboard_id, logs.slice_id AS slice_id, logs.action AS action, max(logs.dttm) AS dttm 
FROM logs 
WHERE logs.action IN ('explore', 'dashboard') AND logs.user_id = <USERID> AND logs.dttm > '2021-08-24T17:20:50.496891'::timestamp AND (logs.dashboard_id IS NOT NULL OR logs.slice_id IS NOT NULL) GROUP BY logs.dashboard_id, logs.slice_id, logs.action) AS anon_1 LEFT OUTER JOIN dashboards ON dashboards.id = anon_1.dashboard_id LEFT OUTER JOIN slices ON slices.id = anon_1.slice_id 
WHERE dashboards.dashboard_title != '' OR slices.slice_name != '' ORDER BY anon_1.dttm DESC 
 LIMIT 6

Our Superset became faster than it was released, today we have around 400 users and more than 100 published dashboards.

Hi @cesar-loadsmart Could you please provide the statement to create the Index? Right now, we are facing this issue also. Could you please provide me the create statement? Thanks in advance.

cesar-loadsmart commented 1 year ago

Hi @chathawee, I believe these two indexes can help you:

CREATE INDEX logs_user_id_dttm_idx ON public.logs USING btree (user_id, dttm)

and

CREATE INDEX logs_user_id_dttm_idx1 ON public.logs USING btree (user_id, dttm) WHERE (((action)::text = ANY ((ARRAY['explore'::character varying, 'dashboard'::character varying])::text[])) AND ((dashboard_id IS NOT NULL) OR (slice_id IS NOT NULL)))

Please test them and let me know.

chathawee commented 1 year ago

Hi @chathawee, I believe these two indexes can help you:

CREATE INDEX logs_user_id_dttm_idx ON public.logs USING btree (user_id, dttm)

and

CREATE INDEX logs_user_id_dttm_idx1 ON public.logs USING btree (user_id, dttm) WHERE (((action)::text = ANY ((ARRAY['explore'::character varying, 'dashboard'::character varying])::text[])) AND ((dashboard_id IS NOT NULL) OR (slice_id IS NOT NULL)))

Please test them and let me know.

Cheers!

D3nn3 commented 10 months ago

This is still a problem and all PR tackling this issue (#20636 and #21873) have been closed. Are there any plans to properly fix the underlying issue here?

rusackas commented 6 months ago

Is anyone still facing this? It's been quiet for quite a long time. CC @john-bodley to see if the original reporting org has metrics here.

D3nn3 commented 6 months ago

Can still reproduce this with 3.1 @rusackas

We didn't try some of the proposed fixes (e.g. creating an index for specific tables), so this is from a vanilla instance (but upgraded from v1 -> v2 -> v3 and not a fresh install of Superset v3).

image