Open Charlyo opened 1 week ago
You are computing 1.6 billion distances, right? That's supposed to take over a minute on one core. What did you get with other packages?
You can't safely clip the result to uint8, as in some cases all 256 bits might be different, and the type will overflow.
You are computing 1.6 billion distances, right? That's supposed to take over a minute on one core. What did you get with other packages?
Currently, compiling it with main-dev got it down to 17s. 🥳
Nontheless, I'm observing 14GB memory usage, which I feel is obnoxious.
You can't safely clip the result to uint8, as in some cases all 256 bits might be different, and the type will overflow.
You are correct! Might have to clip it to uint16.
Can I somehow define that want at most uint16 results?
I don't think we currently have that functionality. Any chance you can contribute that?
I have very limited knowledge on C / C++ code. Which is the currently returned dist element size?
@Charlyo, it's double
, so 8-byte double-precision floating-point numbers. With ChatGPT's help you can probably find a way to bend the python/lib.c
to your will ;)
Hello @ashvardanian !
Been meaning to use
simsimd.cdist
hamming distance withnp.uint8
arrays, but I'm having execution time issues (long execution times) as well as huge memory consumption.I'm trying several packages (faiss, numpy, scikit-learn) etc in order to compute the pairwise hamming distance between two matrices using only one core:
However, this takes more than a minute and take up to 14GB according to memory_profiler package. I assume the resulting dist should be also a char matrix since the input is also a byte matrix.
Am I missing something? I would expect it to take around 10s.
My CPU is an i7 10750h.
Best regards