Closed TheBB closed 7 years ago
I do see the convenience in this, but for this particular case I think it would be advantageous to keep the current API as is. The reason for this is that it is inefficient to do non-tensor computation as opposed to tensor ones and the API should reflect this. Having less options makes it easier to create programs using the library in a good way rather than just a functional way.
Besides, it is already possible to do non-tensor computation in a single line using for-loops. It is not as elegant as your proposed solution, but I think this reflects the cumbersome backend computations just perfectly: is possible, just not as pretty and effective.
from splipy import *
from numpy import *
s = Surface()
all_u = reshape(linspace(0,1,100), (50,2))
x = [s(u[0],u[1]) for u in all_u]
The reason for this is that it is inefficient to do non-tensor computation as opposed to tensor ones and the API should reflect this.
It does though, the tensor method is default.
Having less options makes it easier to create programs using the library in a good way rather than just a functional way.
What about a fast way? I know which one I would prefer:
import splipy.volume_factory as vf
import numpy as np
import time
def timeit(func):
def timed(*args, **kwargs):
ts = time.time()
result = func(*args, **kwargs)
te = time.time()
print('%s %2.2f sec' % (func.__name__, te - ts))
return result
return timed
cube = vf.cube()
cube.raise_order(2, 2, 2)
cube.refine(99, 99)
@timeit
def loop(N=100, L=100):
params = np.linspace(0, 1, N)
for _ in range(L):
a = np.array([cube(p, p, p) for p in params])
return a
@timeit
def diag(N=100, L=100):
params = np.linspace(0, 1, N)
for _ in range(L):
a = cube(params, params, params)
a = np.array([a[i,i,i,:] for i in range(N)])
return a
@timeit
def api(N=100, L=100):
params = np.linspace(0, 1, N)
for _ in range(L):
a = cube(params, params, params, tensor=False)
return a
loop()
diag()
api()
Output:
loop 9.40 sec
diag 142.06 sec
api 1.90 sec
Well then... I stand corrected. I did not expect this much difference. However, when I tried to recreate your setup on my machine, this is the results:
Output:
loop 19.53 sec
diag 11.04 sec
api 2.52 sec
This is using python 3.5.2 with scipy 0.18.1 and numpy 1.12.0. Any ideas what is causing this? Looking at top
shows 400 % CPU use during loop
and diag
, but 100% during api
. Apperently it is parallelizing as much as it can, but is still outperformed by the api-version. Does your diag-version not parallelize?
Yeah, it depends on numpy's multicore capabilities. I ran that benchmark at home, on my work computer I have a parallel capable numpy I guess. (Will check again tonight.)
Unfortunately I needed to use einsum to express the tensor product I wanted for this PR, and einsum famously is strictly single threaded. I don't know if it would be worth it to find another way to do it.
Even without any parallelization, einsum
still outperforms the loop-version by roughly a factor of 5-10. Just make it work with python 2 and we can pull this in.
Updated
I was curious and in upgrading numpy, these are the numbers I found running on the same machine. It appears as numpy tensor evaluation started doing clever things when upgrading to 1.12
loop 10.67 sec
diag 61.33 sec
api 3.02 sec
loop 18.32 sec
diag 12.87 sec
api 2.75 sec
loop 19.36 sec
diag 13.62 sec
api 2.89 sec
Other than that, this looks like a good patch, so we'll merge it in :+1:
Say you have a unit square and want to evaluate it at two arbitrary points. The current evaluation method doesn't let you do that easily, it only works on tensor product grids. This adds an optional argument
tensor
to evaluation functions that enables the alternative behaviour.e.g.