apache / superset

Apache Superset is a Data Visualization and Data Exploration Platform
https://superset.apache.org/
Apache License 2.0
62.5k stars 13.76k forks source link

time filter applieed twice when using time series charts #19404

Closed moathOSA closed 9 months ago

moathOSA commented 2 years ago

im using time and pass it t my query when using the time charts you have to chose the time column when using the time column and the time filter the time condition will execute twice !

this is how the generated query look like

AND derived_tstamp >= '2022-02-28T00:00:00'
AND derived_tstamp < '2022-03-29T00:00:00'
group by domain_sessionid
order by user_sessions desc) AS virtual_table
WHERE time >= TO_TIMESTAMP('2022-02-28 00:00:00.000000', 'YYYY-MM-DD HH24:MI:SS.US')
AND time < TO_TIMESTAMP('2022-03-29 00:00:00.000000', 'YYYY-MM-DD HH24:MI:SS.US')
GROUP BY DATE_TRUNC('day', time)

this how my query


{# Render a blank line #}
{% if from_dttm != None %}
AND derived_tstamp >= '{{ from_dttm }}' AND derived_tstamp < '{{ to_dttm }}'
{% endif %}

superset vresion 1.4.1

Checklist

Make sure to follow these steps before submitting your issue - thank you!

rusackas commented 1 year ago

I'm not sure I'd call this a bug to have both conditions in your original query.

That said, I'm also not sure how the jinja situation plays into this.

Is this still an issue for you in newer versions of Superset, e.g. 2.0.1? Superset 1.4.1 is effectively no longer supported.

Pinging @ville and/or @nytai to see if you can make heads or tails of this.

pasqualtroncone commented 1 year ago

@rusackas Still relevant in 2.1.0. I'm not sure if there is a similar request to this one but I will explain it here.

Short explanation

I cannot or haven't found a way to remove from_dttm and to_dttm from outer query as we would do with get_filters('<filter name>', remove_filter=True).

Long explanation

Perhaps this is a bad example but I think it will illustrate very well what is going on.

Suppose that we want to show a bar chart that shows "number of packets grouped by date of receipt that have been sent in a specific date range by using the date filter"

So, having a virtual data set with the following query ( I use Clickhouse as DB)

SELECT `create_date`,
       `send_date`,
       `receive_date`,
       `package_id`
FROM `default`.`packages`
WHERE
  `send_date`> parseDateTimeBestEffort('{{ from_dttm }}')
  AND `send_date` <= parseDateTimeBestEffort('{{ to_dttm }}')

with create_date as default datetime column.

A bar chart with send_date as TIME COLUMN, Last year as TIME RANGE, a simple count(package_id) as part of the METRICS and receive_date as part of DIMENSIONS

image

will generate the following query

SELECT toStartOfMonth(toDateTime(`send_date`)) AS `__timestamp_ae6234`,
       `receive_date` AS `receive_date_04ccbe`,
       count(`package_id`) AS `COUNT(package_id)_818505`
FROM
  (SELECT `create_date`,
          `send_date`,
          `receive_date`,
          `package_id`
   FROM `default`.`packages`
   WHERE `send_date`> parseDateTimeBestEffort('2022-08-31T00:00:00')
     AND `send_date` <= parseDateTimeBestEffort('2023-08-31T00:00:00')) AS `virtual_table`
WHERE `create_date` >= toDate('2022-08-31')
  AND `create_date` < toDate('2023-08-31')
  AND `send_date` >= toDate('2022-08-31')
  AND `send_date` < toDate('2023-08-31')
GROUP BY `receive_date`,
         toStartOfMonth(toDateTime(`send_date`))
ORDER BY `COUNT(package_id)_818505` DESC
LIMIT 1000;

Lets brake that query down.

  1. send_dateis applied twice. There is no need for that.
  2. Why create_date (default datetime column of the Dataset) has to come into play? Chart will no use or filter anything by it. I am aware that if we don't mark any column as default datetime, the condition disappears from the WHERE section of the query, but once you have selected it you are dooming yourself forever unless you create a new dataset. I guess here is where @moathOSA stopped and decided to ask on github. (This last tip will probably solve your problem @moathOSA )

Some thoughts

It would be interesting if date filters behaved like any other filter. In this way we could define:

I hope this put some light on the road

round3d commented 1 year ago

I don't have this problem with the latest-dev and clickhouse. It is different than with 2.1 because GENERIC_CHART_AXES is now defaulted to true and with it time range is moved into regular filters. I'll check when I can with GENERIC_CHART_AXES false.

A dataset with multiple temporal fields with one as the x axis and another as the dimension works correctly with the filter and the generated sql doesn't produce criteria for both temporal fields. image

rusackas commented 9 months ago

Since we're now on 3.x version support, and 4.x is around the corner, plus it sounds like it was resolved based on the prior comment, I'll close this. If people are still facing an issue here on current versions, please file a new issue with a reproducible case using example data/dashboards if at all possible. The GENERIC_CHART_AXES flag hopefully solved this, and will be gone soon in 4.0.