Open Henry-Chinner opened 8 years ago
This would be insanely great, yes! :)
However, I think context managers might be a more Pythonic way to approach this. Also, I want to avoid breaking NumPy compatibility. This is how I imagine it should work:
with ca.init_stream(1) as stream:
c = ca.dot(a, b)
with ca.init_stream(2) as stream:
f = ca.dot(d, e)
That looks great.
While one the topic, async data transfer could be established using the same method.
with ca.init_stream(1) as stream:
a = ca.array(np_a)
b = ca.array(np_b)
c = ca.dot(a, b)
with ca.init_stream(2) as stream:
d = ca.array(np_d)
e = ca.array(np_e)
f = ca.dot(d, e)
It will be great if cudarray can support Cuda streams for maximum utilization of GPU resource.
Something like,
ca.dot(a,b, stream = 1, out = c ) ca.dot(d,e, stream = 2, out = f )