Closed ogencoglu closed 1 month ago
I also saved the numpy arrays as .npy
, loaded in jupyter and tried to run. Still the same error.
Tried to cast to np.float32
before running, still the same error.
They are of shape: (1106, 2) (422, 2)
Sure, @ogencoglu, this means your vectors don't occupy a continuous buffer in memory and are strided - meaning spacing between nearby rows or matrix cells. Can you please share the output of:
print(matrix1.__array_interface__)
print(matrix2.__array_interface__)
There is also a workaround:
matrix1 = np.ascontiguousarray(matrix1)
matrix2 = np.ascontiguousarray(matrix2)
distance_matrix2 = simsimd.cdist(matrix1, matrix2, metric="sqeuclidean")
Please let me know what the output is and if the workaround helps. I can probably extend the interface to support at least row strides š¤
Thanks for swift reply.
Here is the output:
{'data': (4866344960, False), 'strides': (4, 3904), 'descr': [('', '<f4')], 'typestr': '<f4', 'shape': (976, 2), 'version': 3}
{'data': (105553123344480, False), 'strides': (4, 24), 'descr': [('', '<f4')], 'typestr': '<f4', 'shape': (6, 2), 'version': 3}
and for some other case:
{'data': (6049844224, False), 'strides': (4, 4424), 'descr': [('', '<f4')], 'typestr': '<f4', 'shape': (1106, 2), 'version': 3}
{'data': (6049985024, False), 'strides': (4, 1688), 'descr': [('', '<f4')], 'typestr': '<f4', 'shape': (422, 2), 'version': 3}
When I tried the np.ascontiguousarray
trick, I get:
TypeError: Input tensors must have matching datatypes, check with `X.__array_interface__`
then I explicitly cast with .astype(float)
Now it works but all this makes it slower than scipy cdist
though. I don't have very large matrices but my code calls this distance calculation hundreds of times. I guess these ascontiguousarray
and casting overheads are the bottleneck.
Anyway I ended up implementing in pure numba
which gives several times faster results than scipy cdist. So I got my problem solved. But I will be following SimSIMD closely for sure.
Hey @ogencoglu! I've found the issue!
In your logs:
{'data': (4866344960, False), 'strides': (4, 3904), 'descr': [('', '<f4')], 'typestr': '<f4', 'shape': (976, 2), 'version': 3}
{'data': (105553123344480, False), 'strides': (4, 24), 'descr': [('', '<f4')], 'typestr': '<f4', 'shape': (6, 2), 'version': 3}
Both matrices truly have non-continuous layout. The first stride being smaller that the second indicates that you may have transposed the matrix. Overall, I recommend using a row-major layout. It will give better results with practically every framework, including NumBa implementations. SimSIMD explicitly discourages such usage.
Describe the bug
I have a script and replaced scipy
cdist
with simsimdcdist
:and got the following error:
Bug: ValueError: Input vectors must be contiguous
Any pointer to what this error means?
Steps to reproduce
Basic test such as
seems to be working. Any pointer to what this error means?
If not, I will try to send the numpy .npy files somehow.
Expected behavior
To execute
SimSIMD version
5.5.0
Operating System
MacOS
Hardware architecture
x86
Which interface are you using?
Python bindings
Contact Details
No response
Are you open to being tagged as a contributor?
.git
history as a contributorIs there an existing issue for this?
Code of Conduct