exxeleron / qPython

interprocess communication between Python and kdb+
http://www.devnet.de
Apache License 2.0
152 stars 89 forks source link

Make timestamps be naive #14

Closed cpcloud closed 9 years ago

maciejlach commented 9 years ago

Could you please describe the timezone issues you are experiencing?

We'd like to keep the pandas support optional and thus not to import it outside of the qpython._pandas module.

cpcloud commented 9 years ago

NumPy by default chooses the local timezone. Pandas Timestamps on the other are TZ naive by default.

In [1]: from qpython.qconnection import QConnection as QConn

In [2]: conn = QConn(host='localhost', port=5000)

In [3]: conn.open()

In [4]: conn.sync('([] 2010.10.01D20:03:10 + til 1)', pandas=True).x
Out[4]:
0   2010-10-02 01:03:10
Name: x, dtype: datetime64[ns]
cpcloud commented 9 years ago

I'm happy to put a try suite around it and catch the import error, but then you'll have an inconsistency in the way timezones are handled

cpcloud commented 9 years ago

see this SO post: http://stackoverflow.com/questions/13703720/converting-between-datetime-timestamp-and-datetime64

numpy timezone support is simply broken

maciejlach commented 9 years ago

I switched the epochs representation to naive raw representation with number of millis/nanos.

jreback commented 9 years ago

Your change doesn't fix the problem. You STILL have numpy giving you results in your local time zone. Their IS NO WAY to fix this using numpy. It doesn't have the concept of naive time zones (ATM).

(Pdb) numpy.datetime64('2000-01-04T05:36:57.600Z')
numpy.datetime64('2000-01-04T00:36:57.600-0500')
maciejlach commented 9 years ago

I'm aware that numpy.datetime64 are not truly naive - internally these are stored as number of units since POSIX time and are printed in local timezone.

I switched the representation of q epoch for datetime/timestamp from numpy.datetime created from string (which seemed to be interpreted in local timezone) to ones created from integer value (interpreted as UTC by numpy).

It yields following results:

>>>  print q('([] 2010.10.01D20:03:10 + til 1)').x
>>>  print q('([] 2010.10.01D20:03:10 + til 1)', numpy_temporals=True).x
>>>  print q('([] 2010.10.01D20:03:10 + til 1)', pandas=True).x
[339278590000000000]
['2010-10-01T22:03:10.000000000+0200']
0   2010-10-01 20:03:10
Name: x, dtype: datetime64[ns]

As an alternative approach for pandas conversion, we can skip the intermediate step of converting raw integer vector from q to numpy.datetime64 and use pandas.to_datetime() directly on raw integer vector.