exxeleron / qPython

interprocess communication between Python and kdb+
http://www.devnet.de
Apache License 2.0
151 stars 90 forks source link

pandas DataFrame drops time index when `q('set', numpy.string_('varname'), df) #23

Closed danielkrizian closed 9 years ago

danielkrizian commented 9 years ago

When trying to pass from Pandas to q, the time index in DataFrame (in the column Date) doesn't quite make it to the q side:

import pandas.io.data as web
import datetime
import numpy
import qpython.qconnection as qconnection # requires installation of qPython module from https://github.com/exxeleron/qPython

start = datetime.datetime(2010, 1, 1)
end = datetime.datetime(2015, 2, 6)
f=web.DataReader("F", 'yahoo', start, end) # download Ford stock data (ticker "F") from Yahoo Finance web service
f.ix[:5]  # explore first 5 rows of the DataFrame
# Out:
#             Open  High  Low  Close    Volume  Adj Close
#    Date
#2010-01-04 10.17 10.28 10.05 10.28  60855800       9.43 
#2010-01-05 10.45 11.24 10.40 10.96 215620200      10.05
#2010-01-06 11.21 11.46 11.13 11.37 200070600      10.43
#2010-01-07 11.46 11.69 11.32 11.66 130201700      10.69
#2010-01-08 11.67 11.74 11.46 11.69 130463000      10.72

q = qconnection.QConnection(host = 'localhost', port = 5000, pandas = True) # define connection interface parameters. Assumes we have previously started q server on port 5000 with `q.exe -p 5000` command
q.open() # open connection
q('set', numpy.string_('yahoo'), f) # pass DataFrame to q table named `yahoo`
q('5#yahoo') # display top 5 rows from newly created table on q server 
# Out:
#    Open  High  Low  Close    Volume  Adj Close
#0 10.17 10.28 10.05 10.28  60855800       9.43 
#1 10.45 11.24 10.40 10.96 215620200      10.05
#2 11.21 11.46 11.13 11.37 200070600      10.43
#3 11.46 11.69 11.32 11.66 130201700      10.69
#4 11.67 11.74 11.46 11.69 130463000      10.72

SO thread: http://stackoverflow.com/questions/28385137/pandas-dataframe-drops-index-when-passing-to-kdb-using-qpython-api

By all means, thank you for excellent qPython package!

maciejlach commented 9 years ago

While serializing DataFrame objects the qPython checks for the presence of meta attribute. If the attribute is not present, DataFrame is serialized as q table and index columns are skipped in the process. If you want to preserve the index columns, you have to set the meta attribute and provide type hinting to enforce representation a q keyed table.

Please take a look at the modified sample:

import pandas.io.data as web
import datetime
import numpy
import qpython.qconnection as qconnection # requires installation of qPython module from https://github.com/exxeleron/qPython

from qpython import MetaData
from qpython.qtype import QKEYED_TABLE

start = datetime.datetime(2010, 1, 1)
end = datetime.datetime(2015, 2, 6)
f=web.DataReader("F", 'yahoo', start, end) # download Ford stock data (ticker "F") from Yahoo Finance web service
f.ix[:5]  # explore first 5 rows of the DataFrame
# Out:
#             Open  High  Low  Close    Volume  Adj Close
#    Date
# 2010-01-04 10.17 10.28 10.05 10.28  60855800       9.43 
# 2010-01-05 10.45 11.24 10.40 10.96 215620200      10.05
# 2010-01-06 11.21 11.46 11.13 11.37 200070600      10.43
# 2010-01-07 11.46 11.69 11.32 11.66 130201700      10.69
# 2010-01-08 11.67 11.74 11.46 11.69 130463000      10.72

q = qconnection.QConnection(host = 'localhost', port = 5000, pandas = True) # define connection interface parameters. Assumes we have previously started q server on port 5000 with `q.exe -p 5000` command
q.open() # open connection
f.meta = MetaData(**{'qtype': QKEYED_TABLE}) # enforce to serialize DataFrame as keyed table
q('set', numpy.string_('yahoo'), f) # pass DataFrame to q table named `yahoo`
q('5#yahoo') # display top 5 rows from newly created table on q server 
# Out:
#              Open   High    Low  Close     Volume  Adj Close
# Date                                                         
# 2010-01-04  10.17  10.28  10.05  10.28   60855800       9.43
# 2010-01-05  10.45  11.24  10.40  10.96  215620200      10.05
# 2010-01-06  11.21  11.46  11.13  11.37  200070600      10.43
# 2010-01-07  11.46  11.69  11.32  11.66  130201700      10.69
# 2010-01-08  11.67  11.74  11.46  11.69  130463000      10.72
danielkrizian commented 9 years ago

That's helpful, thanks. I couldn't find the hint about MetaData or QKEYED_TABLE in the docs, so I asked here. Thanks