[pivot table] Query with ROW LIMIT

graceguo-supercat commented 3 years ago

How to reproduce the bug

Create a PivotTable chart,
Add ROW LIMIT = 200
Check generated query by click VIEW QUERY

Actual results

SELECT "dim_destination_geo" AS "dim_destination_geo",
       date_trunc('week', CAST("night_in_latest_year" AS TIMESTAMP)) AS "night_in_latest_year",
       ...
FROM "tmp"."forward_occupancy_forecast_superset_table"
WHERE "night_in_latest_year" >= '2021-03-11 00:00:00.000000'
  AND "night_in_latest_year" < '2021-06-15 00:00:00.000000'
ORDER BY "Nights Backlog" DESC
LIMIT 200;

Instead of filtering out the top K rows in the superset view, it filters out the top K rows in the input data; this results in the pivot table having lots of missing cells.

Expected results

Superset should generate a sub-query with limit like this:

SELECT "dim_destination_geo" AS "dim_destination_geo",
       date_trunc('week', CAST("night_in_latest_year" AS TIMESTAMP)) AS "night_in_latest_year",
       ...
FROM "tmp"."forward_occupancy_forecast_superset_table"
WHERE "night_in_latest_year" >= '2021-03-11 00:00:00.000000'
  AND "night_in_latest_year" < '2021-06-15 00:00:00.000000'
       AND (dim_destination_geo IN
              (SELECT dim_destination_geo
               FROM
                 (SELECT dim_destination_geo,
                         sum(current_forward_nights_occupied)
                  FROM tmp.forward_occupancy_forecast_superset_table
                  WHERE is_most_recent_view = True
                  GROUP BY 1
                  order by 2 DESC
                  LIMIT 200))))
ORDER BY "Nights Backlog" DESC
LIMIT 50000;

Environment

latest master branch

Checklist

Make sure to follow these steps before submitting your issue - thank you!

[x] I have checked the superset logs for python stacktraces and included it here as text if there are any.
[x] I have reproduced the issue with at least the latest released version of superset.
[x] I have checked the issue tracker for the same issue and I haven't found one similar.

Additional context

Add any other context about the problem here. cc @junlincc @zuzana-vej

junlincc commented 3 years ago

Hi @graceguo-supercat, can you attach a screenshot with all the control select. I assume you set filters and groud-by as well?

Couple PRs touched Pivot table lately. may need some help debugging. sorry about the regression. https://github.com/apache-superset/superset-ui/pull/954 https://github.com/apache/superset/pull/13057 (more likely?)

cc @villebro

villebro commented 3 years ago

I don't believe this is a regression - this has most likely always been the case. I agree that the proposal is the correct way to filter the data. I believe we should be able to do this by leveraging the timeseries_limit (although I'd really like to refactor this so that the feature isn't related to timeseries): https://github.com/apache/superset/blob/5fca19da565cae1e70a0dff66007828dbe9fc8ed/superset/connectors/sqla/models.py#L1177-L1224

graceguo-supercat commented 3 years ago

@junlincc this is controls list:

john-bodley commented 3 years ago

@villebro are there any concerns of merely adding a SERIES LIMIT control per your suggestion?

john-bodley commented 3 years ago

Actually @villebro adding the SERIES LIMIT might not be the right approach, i.e., when prototyping the change the limit adheres to the row limit of the groupings of both the GROUP BY and COLUMNS which is likely not what the user is after. Grepping through the code it seems non trivial to decouple this logic.

apache / superset