Is it possible to return Python output into KDB?

trias702 commented 6 years ago

Hello,

Using PyQ from inside a running q process, is it possible at all to return evaluated data from Python in an anonymous and dynamic way?

For example (from q):

q)show .p.e "6*7"

I know that I can send data from Python into q using q global variables as follows:

q)p)q.my_variable = 6*7
q)show my_variable

But this is a very inelegant solution, and would not scale well for large data science operations.

Ideally, something like .p.pyeval in embedPy but for PyQ. Is there anything like this I could use?

I would use embedPy if I could, but I'm on Windows, and while I can get the p.dll from embedPy to compile, it always crashes whenever I try to use it. PyQ I can successfully compile and operate on Windows.

I asked a similar question to this last year, and at the time, this was the ultimate answer from @abalkin

If you want to use .p.e or p) syntax, then you are restricted to communication via global variables.

Is this restriction still the case for PyQ 4.1.2?

abalkin commented 6 years ago

The default .p.e still does not return values to q, but support for functions exported from Python to q has been extended in recent versions and you can implement your own pyeval function as follows:

>>> def pyeval(x):
...     return eval(str(x))
...
>>> q.pyeval_ = pyeval
>>> q()
q)pyeval:{pyeval_ enlist x}
q)pyeval "1+2"
3

You may also take a look at embedPy where returning values from Python to q is taken to the whole new level. In the next release of pyq we are planning to allow optional installation of embedPy in the same kdb+ setup. See #30.

trias702 commented 6 years ago

Many thanks for the pyeval workaround I really appreciate it!

Btw, I would very much love to use embedPy, but it sadly seems to have severe incompatibilities with Windows which PyQ doesn't have. If using mingw64 GCC, PyQ is very easy to compile with full functionality, but not EmbedPy.

I spent the entire day today re-writing elements of p.c for embedPy to get it to work on Windows with mingw64 gcc, but I was not successful (I can get it to compile easily, but not work). I think the problem is the dyl() call which no longer uses LoadLibraryA from windows.h the way that PyQ uses. My only other theory is that it could be the pthreads.h calls, but mingw64 has supplied win32 pthreads for years, so I kind of doubt that it's the issue.

Either way, if you could please look into possibly bringing some of the embedPy code closer in line with PyQ, at least for dyl(), I would really greatly appreciate it!

abalkin commented 6 years ago

I would very much love to use embedPy, but it sadly seems to have severe incompatibilities with Windows which PyQ doesn't have.

The Kx team is working on this. See also KxSystems/embedPy#17.

abalkin commented 6 years ago

Ref: internal tracker gl-628.

trias702 commented 6 years ago

Sorry, another question regarding function bindings in PyQ. I'm trying to create a nice, elegant way of sending q tables to Python as pandas DataFrames from within q. I have come up with the following solution, but I'm not sure why it is not working:

cat test.p
from pyq import q,K
import numpy as np
import pandas as pd

# define python to q comms funcs
def q_set(name, value):
    globals()[str(name)] = value
    return K(str(name))

def q_qt2df(x):
    return pd.DataFrame(dict(x.flip))

q.set('.p.set', q_set)
q.set('.p.qt2df', q_qt2df)

Then from q:

q)\l test.p
q)x:([]c1:1000?1f;c2:1000?100)
q).p.set ("Xp"; .p.qt2df x)
q).p.e "print(type(Xp))"
<class 'pyq.K'>       // was expecting to get pandas.core.frame.DataFrame

Why is Xp in Python still a pyq.K object when it passes through the .p.qt2df function which returns a pandas DataFrame?

I know that I can do the conversion entirely on the Python side, like so:

q).p.set ("Xp"; x)
q).p.e "Xp = pd.DataFrame(dict(Xp.flip))"

But this is very inelegant. Ideally, I would like to send q tables through to Python as DataFrames in one step.

abalkin commented 6 years ago

I would like to send q tables through to Python as DataFrames in one step.

This would be against PyQ philosophy which says that data should gravitate towards q. This is why conversion from Python to q is implicit and once your qt2df is exported to q as .p.qt2df, it will convert its return type to a q object. Note that embedPy does not follow the same philosophy and uses a recent undocumented feature in q that allows wrapping "foreign" objects (such a Panda dataframes) in a special q datatype.

For PyQ users, we recommend that most data manipulation is performed with data stored in q table and you only convert final results to panda for visualization.

Note that q table columns can be manipulated directly by numpy. For example:

q)t:([]a:5?10f;b:0f)
q)t
a        b
----------
3.017723 0
7.85033  0
5.347096 0
7.111716 0
4.11597  0
q)p)import numpy
q)p)numpy.log10(q.t.a, numpy.asarray(q.t.b))
q)t
a        b
------------------
3.017723 0.4796793
7.85033  0.8948879
5.347096 0.728118
7.111716 0.8519744
4.11597  0.6144722

This said, we are working on incorporating embedPy in PyQ and in the future you should be able to manipulate panda DataFrames from q without conversion.

trias702 commented 6 years ago

That's fair enough, thank you for the explanation.

What about sending data from q into Python as a numpy matrix in one step, is that possible?

So I have an in-memory table in q, which I do all data manipulation on. Now I want to send it to Python as a numpy matrix so that I can feed it into SkLearn/Keras and do some .p.e commands on it. Would it be possible to perform this action in one step similar to my attempt with Pandas earlier?

I note that the conversion syntax (in Python) from q table to DataFrame is:

pandas.DataFrame(dict(q_table_object.flip))

Is there a similar pattern for q table to numpy matrix?

abalkin commented 6 years ago

What about sending data from q into Python as a numpy matrix in one step, is that possible?

Is this what you are looking for?

q)x:2 3#til 6
q)p)import numpy
q)p)print(numpy.array(q.x))
[[0 1 2]
 [3 4 5]]

Note that this creates a copy of your data.

If your data is in a table, you can do something like this:

q)t:([]a:0 1 2;b:3 4 5)
q)p)print(numpy.array(q.t.flip.value))
[[0 1 2]
 [3 4 5]]

Here, flip and value are cheap operations, but numpy.array() still creates a copy.

trias702 commented 6 years ago

Not exactly, since that still requires communicating with Python via global variables. I'm looking to apply the same design pattern as before, but was hoping that it would work for numpy matrices since they might be easier to bind with inside q:

cat test.p
from pyq import q,K
import numpy as np

# define python to q comms funcs
def q_set(name, value):
    globals()[str(name)] = value
    return K(str(name))

def q_qt2np(x):
    return np.array(x.flip.value)

q.set('.p.set', q_set)
q.set('.p.qt2np', q_qt2np)

Then from q:

q)\l test.p
q)x:([]c1:1000?1f;c2:1000?100)
q).p.set ("Xp"; .p.qt2np enlist x)
q).p.e "print(type(Xp))"   <--- hoping that this prints "numpy.array" but sadly it still prints pyq.K

I definitely understand your reasons for why PyQ is not able to do this, because of the design philosophy, it all makes sense. It sounds like embedPy has more of the functionality which I'm looking for, but sadly it doesn't work on Windows yet.

Thank you for the help though, I can do what I require for my projects with PyQ, just won't be as efficient as I was hoping for.

abalkin commented 6 years ago

It looks like you are trying to put the logic in q that belongs to python. Why don't you do the following in your test.p:

from pyq import q,K
import numpy as np

q)x:([]c1:1000?1f;c2:1000?100)

Xp = np.array(q.x.flip.value)

Now,

q test.p
q)p)print(type(Xp))
<class 'numpy.ndarray'>

KxSystems / pyq

Is it possible to return Python output into KDB? #43