druid-io / pydruid

A Python connector for Druid
Other
506 stars 194 forks source link

Question: Is it possible to work with timezones? #194

Closed tomas-kucera closed 4 years ago

tomas-kucera commented 4 years ago

Hi there, looking for a way to work with timezones. All timestamps are in UTC timezone. Running this query:

ts = query.timeseries(
datasource='events',
granularity={'type': 'period', 'period': 'P1D'},
intervals='1901-01-01T00:00:00-09:00/2101-01-01T00:00:00-09:00',
aggregations={'reqs': count('*')},
filter=Dimension('event_type') == 'request')

Result:

       reqs                 timestamp
0      7273  2019-11-01T00:00:00.000Z
1      4009  2019-11-02T00:00:00.000Z
2      5271  2019-11-03T00:00:00.000Z
3      4793  2019-11-04T00:00:00.000Z
4      3051  2019-11-05T00:00:00.000Z
..      ...                       ...
102  458187  2020-02-11T00:00:00.000Z
103  419327  2020-02-12T00:00:00.000Z
104  400851  2020-02-13T00:00:00.000Z
105  518169  2020-02-14T00:00:00.000Z
106  326337  2020-02-15T00:00:00.000Z

The counts are for UTC TZ, but I would like to get the following record counts in another TZ, such as America/Los Angeles (-08:00) TZ.

Doing it on the result is OK when doing it on a full dataset but as soon as I would specify a filter for for example a few days only, then for the first and the last days I get incorrect results.

So I am looking for a way to specify the filter in a different TZ and get the correct results. The filter setting in my example shows the TZ setting, which is actually ignored.

Any idea? Thanks!

MSadeghzadehG commented 4 years ago

I think you should use separate queries for each TZ, and then aggregate results for your use-case. you can use this link to specifying TZ for each query: https://druid.apache.org/docs/latest/querying/granularities.html#period-granularities

tomas-kucera commented 4 years ago

I think you should use separate queries for each TZ, and then aggregate results for your use-case. you can use this link to specifying TZ for each query: https://druid.apache.org/docs/latest/querying/granularities.html#period-granularities

@MSadeghzadehG - You made my day! The following code:

ts = query.timeseries(
datasource='adela-events',
granularity={'type': 'period', 'period': 'P1D', 'timeZone': 'America/Los_Angeles'},
intervals='1901-01-01T00:00:00+00:00/2101-01-01T00:00:00+00:00',
aggregations={'reqs': count('*')},
filter=Dimension('entry_type') == 'req')

produced this nice result (notice the TZ change on 2019-11-03 - DST ;) ):

       reqs                      timestamp
0      1252  2019-10-31T00:00:00.000-07:00
1      6307  2019-11-01T00:00:00.000-07:00
2      4014  2019-11-02T00:00:00.000-07:00
3      5673  2019-11-03T00:00:00.000-07:00
4      4612  2019-11-04T00:00:00.000-08:00
..      ...                            ...
104  402497  2020-02-12T00:00:00.000-08:00
105  483927  2020-02-13T00:00:00.000-08:00
106  408150  2020-02-14T00:00:00.000-08:00
107  331610  2020-02-15T00:00:00.000-08:00
108  324127  2020-02-16T00:00:00.000-08:00

Thanks!

MSadeghzadehG commented 4 years ago

You’re welcome.