KxSystems / pykx

PyKX is a Python first interface to the worlds fastest time-series database kdb+ and it's underlying vector programming language q.
https://code.kx.com/pykx
Other
45 stars 10 forks source link

[Pandas API] Unexpected behaviour for max, min, prod and sum. #18

Closed neutropolis closed 5 months ago

neutropolis commented 6 months ago

Describe the bug The functions max, min, prod and sum produce a length error when invoked from a keyed table.

To Reproduce

>>> import pykx as kx
>>> t = kx.q('([a:1 2]b:3 4;c:5 6)')
>>> t
pykx.KeyedTable(pykx.q('
a| b c
-| ---
1| 3 5
2| 4 6
'))
>>> t.max()
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/Users/j/Github/hablapps/pykx/pykx-dev/lib/python3.9/site-packages/pykx/pandas_api/__init__.py", line 57, in return_va
l
    res = func(*args, **kwargs)
  File "/Users/j/Github/hablapps/pykx/pykx-dev/lib/python3.9/site-packages/pykx/pandas_api/pandas_meta.py", line 80, in inner
    return q('{[x; y] y!x}', res, cols)
  File "/Users/j/Github/hablapps/pykx/pykx-dev/lib/python3.9/site-packages/pykx/embedded_q.py", line 226, in __call__
    return factory(result, False)
  File "pykx/_wrappers.pyx", line 507, in pykx._wrappers._factory
  File "pykx/_wrappers.pyx", line 500, in pykx._wrappers.factory
pykx.exceptions.QError: length

Expected behavior In my view, these methods should behave as follows:

>>> import pykx as kx
0 1 2 3 4 5 6 7 8 9
>>> t = kx.q('([a:1 2]b:3 4;c:5 6)')
>>> t
pykx.KeyedTable(pykx.q('
a| b c
-| ---
1| 3 5
2| 4 6
'))
>>> t.max()
pykx.Dictionary(pykx.q('
b| 4
c| 6
'))

Desktop (please complete the following information):

Additional context This problem is originated in the preparse_computations function at the Pandas API https://github.com/KxSystems/pykx/blob/main/src/pykx/pandas_api/pandas_meta.py#L52-L70. Indeed, this function filters the keys from the keyed table but returns the original columns. Later, once the @convert_result decorator is invoked, it results in a length mismatch, since the number of columns is greater than the number of results. In fact, this problem should also affect all and any methods from the Pandas API, but I see that they can't be invoked since they are overwritten by the KeyedTable class https://github.com/KxSystems/pykx/blob/main/src/pykx/wrappers.py#L3291-L3295.