ashvardanian / SimSIMD

Up to 200x Faster Inner Products and Vector Similarity — for Python, JavaScript, Rust, C, and Swift, supporting f64, f32, f16 real & complex, i8, and binary vectors using SIMD for both x86 AVX2 & AVX-512 and Arm NEON & SVE 📐
https://ashvardanian.com/posts/simsimd-faster-scipy/
Apache License 2.0
793 stars 42 forks source link

Memory usage problems #142

Open Charlyo opened 1 week ago

Charlyo commented 1 week ago

Hello @ashvardanian !

Been meaning to use simsimd.cdist hamming distance with np.uint8 arrays, but I'm having execution time issues (long execution times) as well as huge memory consumption.

I'm trying several packages (faiss, numpy, scikit-learn) etc in order to compute the pairwise hamming distance between two matrices using only one core:

a_array = np.random.randint(0, 255, (1000, 64), dtype=np.uint8)
b_array = np.random.randint(0, 255, (1_600_000, 64), dtype=np.uint8)

start = time.time()
dist = np.array(simsimd.cdist(b_array, a_array, 'hamming'), dtype=np.uint8)
end = time.time()
print("SimSIMD ", end - start)

However, this takes more than a minute and take up to 14GB according to memory_profiler package. I assume the resulting dist should be also a char matrix since the input is also a byte matrix.

Am I missing something? I would expect it to take around 10s.

My CPU is an i7 10750h.

Best regards

ashvardanian commented 1 week ago

You are computing 1.6 billion distances, right? That's supposed to take over a minute on one core. What did you get with other packages?

You can't safely clip the result to uint8, as in some cases all 256 bits might be different, and the type will overflow.

Charlyo commented 1 week ago

You are computing 1.6 billion distances, right? That's supposed to take over a minute on one core. What did you get with other packages?

Currently, compiling it with main-dev got it down to 17s. 🥳

Nontheless, I'm observing 14GB memory usage, which I feel is obnoxious. image

You can't safely clip the result to uint8, as in some cases all 256 bits might be different, and the type will overflow.

You are correct! Might have to clip it to uint16.

Charlyo commented 1 week ago

Can I somehow define that want at most uint16 results?

ashvardanian commented 1 week ago

I don't think we currently have that functionality. Any chance you can contribute that?

Charlyo commented 1 week ago

I have very limited knowledge on C / C++ code. Which is the currently returned dist element size?

ashvardanian commented 1 week ago

@Charlyo, it's double, so 8-byte double-precision floating-point numbers. With ChatGPT's help you can probably find a way to bend the python/lib.c to your will ;)