apache / superset

Apache Superset is a Data Visualization and Data Exploration Platform
https://superset.apache.org/
Apache License 2.0
60.9k stars 13.2k forks source link

Non necessary self join in Area chart query could bring performance issues #15470

Closed ecpbi closed 5 months ago

ecpbi commented 3 years ago

Hello,

I implemented a relational schema on a 10.5.11 MariaDB database to store the data of my data warehouse and I installed the version 0.999.0dev of Superset (this is what tells me the "About" item of the "Settings" menu).

I just tried the area chart and I noticed the following strange behavior of the query generator. It creates a non needed self join. Here is what I did and what was generated. First, I created avirtual SQL dataset from a query that works perfectly well on my DB.

_SELECT va.application_host , va.application_pid , va.application_port , usr.appuserlogin , apps.EventDate , apps.EventTime , vs.service_host , vs.service_pid , vs.service_port , ori.origine_host , ori.origine_pid , ori.origine_port , oc.operation_categorie , oc.operation , oc.WHATSUCCESS , oc.objet_categorie , apps.objet , apps.PERFDATA FROM application_supervision AS apps NATURAL JOIN appuser AS usr NATURAL JOIN v_application AS va NATURAL JOIN v_service AS vs NATURAL JOIN v_origine AS ori NATURAL JOIN objetcategorie AS oc

Then I used this dataset to create a chart drawing an area chart. image

Finaly, I ran the query to get the chart on my screen, that works perfectly, maybe because my tables count a very low number of rows. After the rendering of the chart, I look at the query that is generated by the tool and I get the following.

_SELECT DATE(EventDate) AS timestamp, application_host AS application_host, application_port AS application_port, SUM(PERFDATA) AS nb transactions FROM (SELECT va.application_host , va.application_pid , va.application_port , usr.appuserlogin , apps.EventDate , apps.EventTime , vs.service_host , vs.service_pid , vs.service_port , ori.origine_host , ori.origine_pid , ori.origine_port , oc.operation_categorie , oc.operation , oc.WHATSUCCESS , oc.objet_categorie , apps.objet , apps.PERFDATA FROM application_supervision AS apps NATURAL JOIN appuser AS usr NATURAL JOIN v_application AS va NATURAL JOIN v_service AS vs NATURAL JOIN v_origine AS ori NATURAL JOIN objet_categorie AS oc) AS virtual_table INNER JOIN (SELECT application_host AS application_host, application_port AS application_port, SUM(PERFDATA) AS mme_inner FROM (SELECT va.application_host , va.application_pid , va.application_port , usr.appuserlogin , apps.EventDate , apps.EventTime , vs.service_host , vs.service_pid , vs.service_port , ori.origine_host , ori.origine_pid , ori.origine_port , oc.operation_categorie , oc.operation , oc.WHATSUCCESS , oc.objet_categorie , apps.objet , apps.PERFDATA FROM application_supervision AS apps NATURAL JOIN appuser AS usr NATURAL JOIN v_application AS va NATURAL JOIN v_service AS vs NATURAL JOIN v_origine AS ori NATURAL JOIN objet_categorie AS oc) AS virtual_table WHERE ((objet_categorie ='Transactions' AND application_host like 'fr1btp260%')) GROUP BY application_host, application_port ORDER BY mme_inner__ ASC LIMIT 100) AS anon_1 ON application_host = application_host AND application_port = application_port WHERE ((objet_categorie ='Transactions' AND application_host like 'fr1btp260%')) GROUP BY application_host, application_port, DATE(EventDate) LIMIT 10000;

I understand tue definition of the virtual_table derived table, as only a few columns of the dataset are used, but what is the utility of the anon_1 derived table ? This derived table is just useless as I get exactly the same result by simply dropping it from the query. Not only is this self join not usefull, but it has a strong probability to bring performance issuses when the data grow up. Maybe this has a purpose, but, from my point of view, this is only a bug.

Regards,

AlmiS commented 2 years ago

This seems to be the result of setting the "series limit" option. If you hover the little info symbol on this option, you will see the description which explains this behaviour.

stale[bot] commented 2 years ago

This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions. For admin, please label this issue .pinned to prevent stale bot from closing the issue.

Shazad-khan commented 1 year ago

I treid reproducing the issue but it works fine for me please recheck the issue and close the issue if already fixed

ecpbi commented 1 year ago

Hi Shazad-Khan, Thank you for your efforts. I actually did not fully understand this feature. This works fine.

rusackas commented 5 months ago

Sounds like this one should have been closed. Happy to re-open if there's more that needs discussion.