Hey.
I was just playing around with this and was trying to see if there's a way to implement this efficiently with standard libs.
My usual way to do things like this is using the scipy.sparse.coo_matrix construct.
import scipy.sparse as sp
def bincount2d(x, y, bins):
return sp.coo_matrix((np.ones(x.shape[0]), (x, y)), shape=(bins, bins), dtype=np.int)
If the data was scaled so that making it ints would put it in the right bins, this would work.
import numpy as np
x = np.random.random(10_000_000)
y = np.random.random(10_000_000)
from fast_histogram import histogram2d
%timeit _ = histogram2d(x, y, range=[[0, 1], [0, 1]], bins=30)
So your code would "only" be 5x faster, so it's about a 4x speedup over numpy.
Unfortunately I cheated and didn't include shifting `xx so that the data aligns with the bins. I don't think it's possible to make this work without copying the data at least once, which is why I'm giving up on this route.
Hey. I was just playing around with this and was trying to see if there's a way to implement this efficiently with standard libs.
My usual way to do things like this is using the
scipy.sparse.coo_matrix
construct.If the data was scaled so that making it ints would put it in the right bins, this would work.
153 ms ± 4.04 ms per loop
So your code would "only" be 5x faster, so it's about a 4x speedup over numpy. Unfortunately I cheated and didn't include shifting
`xx
so that the data aligns with the bins. I don't think it's possible to make this work without copying the data at least once, which is why I'm giving up on this route.Thought it might be of interest, though ...