Blosc / bcolz

A columnar data container that can be compressed.
http://bcolz.blosc.org
959 stars 149 forks source link

running diff on carray returns short arrays #20

Open btel opened 12 years ago

btel commented 12 years ago

When calculating array derivative (diff) carray shortens the array:

import carray as ca
import numpy as np

carr = ca.arange(1000000)
diff_arr = ca.eval("np.diff(carr)", vm="python")
nd_arr = np.diff(carr)

print "Number of elements in c:", len(carr)
print "Number of elements in diff_arr:", len(diff_arr)
print "Number of elements in nd_arr:", len(np_arr)

This returns on my computer:

Number of elements in carr: 1000000
Number of elements in diff_arr: 999877
Number of elements in nd_arr: 999999

Derivatives calculated with ndarray and carray have different lengths.

FrancescAlted commented 12 years ago

This is because np.diff is not keeping the length of the output the same than operands. This is not easily supported by a blocking technique, and should be implemented as 'toplevel' function.

Anyway, meanwhile this should raise a NotImplementedError at very least.

FrancescAlted commented 12 years ago

A possible implementation of np.diff for carray could be:

l = len(carr)-1 diff_arr = ca.fromiter((carr[i+1] - carr[i] for i in xrange(l)), 'f8', l)

which is about a 30% faster than np.diff on my machine (but the above only supports unidimensional arrays).